System, method and computer program product for generating remote views in a virtual mobile device platform using efficient processing during display encoding

ABSTRACT

Embodiments disclosed herein provide systems, methods and computer readable media for generating remote views in a virtual mobile device platform. A virtual mobile device platform may be coupled to a physical mobile device over a network and generate frames of data for generating views on the physical device. These frames can be generated using an efficient display encoding pipeline on the virtual mobile device platform. Such efficiencies may include, for example, the synchronization of various processes or operations, the governing of various processing rates, the elimination of duplicative or redundant processing, the application of different encoding schemes, the efficient detection of duplicative or redundant data or the combination of certain operations.

RELATED APPLICATIONS

This application claims a benefit of priority under 35 U.S.C. 119 of thefiling date of U.S. Patent Application Ser. No. 62/367,867, by inventorsLee et al., entitled “SYSTEM, METHOD AND COMPUTER PROGRAM PRODUCT FORGENERATING REMOTE VIEWS IN A VIRTUAL MOBILE DEVICE PLATFORM USINGEFFICIENT PROCESSING DURING DISPLAY ENCODING”; U.S. Patent ApplicationSer. No. 62/367,871, by inventors Lee et al., entitled “SYSTEM, METHODAND COMPUTER PROGRAM PRODUCT FOR GENERATING REMOTE VIEWS IN A VIRTUALMOBILE DEVICE PLATFORM USING EFFICIENT MACROBLOCK COMPARISON DURINGDISPLAY ENCODING, INCLUDING EFFICIENT DETECTION OF UNCHANGEDMACROBLOCKS”; and U.S. Patent Application Ser. No. 62/367,876 byinventors Lee et al., entitled “SYSTEM, METHOD AND COMPUTER PROGRAMPRODUCT FOR GENERATING REMOTE VIEWS IN A VIRTUAL MOBILE DEVICE PLATFORMUSING EFFICIENT COLOR SPACE CONVERSION AND FRAME ENCODING,” all filed onJul. 28, 2016, and expressly incorporated by reference for all purposes.

TECHNICAL FIELD

This disclosure relates generally to a virtual mobile device platformfor mobile devices. In particular, embodiments disclosed herein relateto a systems, methods, and computer readable media for generating remoteviews in a virtual mobile device platform. More particularly,embodiments disclosed herein relate to systems, methods and computerreadable media for display encoding pipelines used for generating remoteviews in a virtual mobile device platform. Even more specifically,embodiments disclosed relate to display encoding pipelines thatimplement efficient processing in a display encoding pipeline.

BACKGROUND

Today's mobile devices such as smart phones and tablets face uniquesecurity issues, some of which go hand in hand with mobility.Enterprises, military, and intelligence agencies (collectively referredto herein as “organizations”) are all grappling with their users' use ofmobile devices as many users are carrying out both business as well aspersonal activities on their mobile devices. This can be problematiceven if a Bring Your Own Device (“BYOD”) device policy is in place.

BYOD can raise serious security issues when a user's personal device isused to access both non-sensitive and sensitive (and sometimes risky)networks and/or services. For example, if an employee uses his personalsmartphone to access a company network and then loses that phone,untrusted parties could retrieve any unsecured data on the phone.Another type of security breach occurs when an employee leaves acompany, she does not have to give the company back her personal device,so company-owned applications and other data may still be present on herpersonal device. A challenging but important task for organizations thatutilize BYOD is to develop a policy that defines exactly what sensitivecompany information needs to be protected and which employees shouldhave access to this information, and then to educate all employees onthis policy. Commercial carriers are normally relied upon forimplementing the security requirements of an organization's BYOD policy.

Because of Internet-based risks, some very risk-averse organizationsissue devices specifically for Internet use (this is termed“Inverse-BYOD”), providing unfiltered access to the Internet andreserving filtered, sensitive network data for use within a secured,private network. However, this means that a user likely has to carrymultiple devices (including one for his personal use) and organizationsdo not have a sure way of preventing the user from using his personalmobile device to communicate non-sensitive but company-relatedinformation. As such, organizations continue to search for solutionsthat allow mobile services to be delivered or shared within a singledevice, rather than having to issue their users multiple devices orseparate devices for their personal use and locking them into privatenetworks.

Finding viable solutions to handle mobile devices can be particularlychallenging for organizations that operate in high assurance computingenvironments. A high assurance computing environment is one thatprovides a certain level of assurance as to its behavior, useful inensuring a level of secrecy for classified information. For instance, ahigh assurance operating system may permit only certain certifiedapplications to access a particular portion of a memory on a devicewhere sensitive information is stored. However, this does not preventthe physical device itself to become suspect—how it was built, who hashandled it from manufacturing through use, how it is used by the user,etc. Moreover, the device could be physically accessed or otherwisecompromised in many ways. For instance, information stored or cached ona mobile device could be accessed while its owner is away (e.g., left onthe table at a restaurant or on their desk at work, stolen, or lost) orthe user may have downloaded an infected application or could be sent aninfected document via email or instant messaging, or accessed aninfected service.

Because a mobile device lives in a hostile world, securing the physicaldevice itself (e.g., via Tempest hardware, encrypted storage,biometrics, etc.) is not enough and can be very expensive to do athorough job. Even so, infiltration from any portion of the stack—fromthe chips to the software that is installed to the data the devicereceives—still leaves the device vulnerable to attacks from well-funded,motivated, adversaries. Attempts to provide the level of separationneeded within the actual device face many challenges, and at best arelikely to become a very expensive niche proposition in the overallcommercial mobility ecosystem.

In view of unique challenges in incorporating mobile devices such assmart phones and tablets into secure computing environments, there isroom for innovations and improvements.

SUMMARY

To address those desires, amongst others, embodiments as disclosedherein may be used to provide a system, method, and computer programproduct for generating remote views in a virtual mobile device platform.In some embodiments, events from a physical mobile device are sent to avirtual device. The virtual device creates one or more views based onthe received events. Graphical attributes of one or more of the createdviews are captured and sent to the physical mobile device. Remote viewsare constructed and displayed on the physical mobile device based on theattributes received from the virtual device. For views where graphicalattributes are not captured, compressed video of the respective viewscan be sent to the physical mobile device. Embodiments disclosed hereincan provide many advantages. For example, in some embodiments,generating remote views using graphical attributes improves theresponsiveness of remote applications, as well as reduces videobandwidth consumption.

However, the generation of these remote views is not without itschallenges. In particular, as the events are sent from a physical mobiledevice to the virtual mobile device platform over a network and remoteviews are generated at the virtual mobile device platform and sent tothe physical mobile device to be rendered over the network, embodimentsmay be particularly sensitive to latency that may be introduced atvarious points during the reception and processing of events, thegeneration of the remote views, or the transmission of these remoteviews to the physical mobile device. Though embodiments as disclosed areaimed, at least in part, at providing a greater level of security forthe physical mobile device, it is desirable that the greater level ofsecurity imposes little or no burden with respect to the usability offunctionality of the physical mobile devices by the user.

In fact, it would be ideal if the operation of the user's physicalmobile device in a virtual mobile device platform was undetectable tothe user. On a physical device in a user's hand, there is essentiallyzero latency between when the user generates events and when a locallyinstalled application receives them. Similarly, there's essentially zerolatency between when a locally installed application produces new visualcontent and the display offers it to the user. While such a goal may notachievable in all instances or circumstances, embodiments as disclosedherein may address these goals, among others, by reducing latency or useof computational resources involved in the generation of remote views onthe virtual mobile device platform and the transmission of these remoteviews to the physical mobile device.

In one embodiment, an efficient display encoding pipeline may beimplemented at the virtual mobile device platform to process displayframes generated by a guest operating system (OS) executing on a virtualdevice executing on a virtual machine in the virtual mobile deviceplatform. The display encoding pipeline may include a pre-processoroperating in a display thread of the virtual machine and a displayencoder. The display system of the guest OS may generate a display frameincluding pixel data in an RGB color space. This pixel data may begrouped or organized into a set of macroblocks. The pre-processor mayperform a number of tasks on the display frame generated by the displaysystem of the guest OS including converting the frame to a YUV colorspace. The display encoder of the display pipeline may encode theconverted frame to send the encoded frame to the physical mobile devicewhere it may be presented the physical mobile device.

To achieve efficiencies, within this display pipeline a number ofoptimizations may be implemented according to certain embodiments. Forexample, in one embodiment, the frame generation of the display systemof the guest OS may be synchronized with the output of the displayencoder by using the output of an encoded frame by the display encoderof the pipeline to unblock the display system of the guest OS bysignaling the display system's capability to cause the display system ofthe guest OS generate another frame.

Similarly, in one embodiment, the display thread of the virtual machinein which a pre-processing component of the display encoding pipeline isbeing executed may be synchronized with the output of the display systemof the guest OS such that the display thread of the virtual machineplatform only performs processing when triggered by the output of adisplay frame by the display system of the guest OS.

To further synchronize the components of the display encoding pipelineof the virtual mobile device platform, in some embodiments a frame rategovernor may also be utilized to reduce the processing caused by, forexample, applications that repeatedly generate duplicative frames. Forexample, a governor that can detect and compare frames from anapplication and throttle the frame processing for the application if oneor more duplicative frames are generated within a particular time frame.The throttling can be graduated or staged depending on, for example, thenumber of duplicative frames or the time period.

Moreover, in some embodiments, the type of data in each macroblock of aframe may be detected. Different encoding schemes may be applied todifferent macroblocks based on the different types of data detected. Inthis manner, the data can be more efficiently encoded, both from acompression standpoint (e.g., better compression may be achieved) butadditionally, from a computational resources standpoint (e.g., fewercomputational resources may be required to perform such encoding).

These types of efficiencies may also be achieved in certain embodimentsby employing efficient color space conversion and encoding of thedisplay frame. In particular, in certain embodiments macroblocks of adisplay frame that are unchanged or have not moved (referred to as ZeroMotion Vector or ZMV macroblocks) with respect to a previous frame maybe detected. Based on the detection of these ZMV macroblocks, colorspace conversion and encoding may not be performed. Instead, apreviously encoded version of this macroblock may be utilized, obviatingthe need to (re) perform color space conversion or encoding of thosemacroblock. Accordingly, the computing resources that would be requiredto perform such (re) converting or encoding may be avoided.

While almost any method desired may be utilized to detect such ZMVmacroblocks, in one embodiment, the detection of these ZMV macroblocksmay also be performed in an efficient manner by efficiently makingmemory comparisons of macroblock data between current frame data andprevious frame data using a particular set of instructions to effect aline by line comparison. By comparing and conditionally copying in thismanner, what would be a separate compare and a separate copy step isessentially transformed into a conjoined compare and copy step.

Accordingly, embodiments as disclosed herein may achieve a number ofefficiencies including reduction in the amount of computationalresources consumer (e.g., use of CPU cycles or memory) and bettercompression (e.g., smaller size or higher quality data for the samerelative size) which may, in turn, result in reduced latency or lowerbandwidth usage. Such efficiencies may be especially useful in theimplementation of display encoding pipelines in virtual mobile deviceplatforms.

Specifically, in one embodiment, a system for a virtual mobile deviceplatform with efficient frame processing, can include a virtual mobiledevice platform coupled to a physical mobile device over a network wherethe virtual mobile device platform includes a processor executinginstructions on a non-transitory computer readable medium forimplementing a virtual machine. The virtual machine may execute avirtual mobile device associated with a physical mobile devicecommunicating with the virtual mobile device platform over the network,the virtual mobile device including a guest operating system (OS) andone or more applications executing on the guest OS. The guest OSgenerates a frame of display data from an application executing on theguest OS. The virtual mobile device may include a video encoderincluding an input/output (I/O) thread for generating a converted frameby performing color space conversion on the frame of display datagenerated by the guest OS and a display encoder for generating anencoded frame by encoding the converted frame generated by the I/Othread. The generation of the frame of display data by the guest OS issynchronized to the generation of the encoded frame by the video encoderand the encoded frame is sent to the physical mobile device by thevirtual mobile device platform.

In certain embodiments, the video encoder is executing on the guestoperating system or the generation of the frame by the guest OS isblocked after the frame is generated.

In another embodiment, the guest OS includes a display control processthat blocks after the generation of the frame and the display controlprocess includes a display control synchronizer responsive to the outputof the video encoder such that the generation of the encoded framecauses the display control synchronizer to unblock the display controlprocess. The display control synchronizer can include, for example, amutex or a semaphore.

In yet other embodiments, the guest OS includes a display controlprocess that controls the generation of the frame of display data, andthe generation of the frame is blocked by configuring the VSYNC of thedisplay control process according to a timer. In some embodiments, theI/O thread of the video encoder is blocked until the frame of displaydata is generated by the guest OS. As an example, the guest OS mayinclude a display control process having a display control synchronizer,and the I/O thread includes an I/O synchronizer for blocking the I/Othread after the generation of the converted frame and unblocking theI/O thread after receiving a notification from the display controlsynchronizer that the frame was generated.

In one embodiment, a system for a virtual mobile device platform withefficient frame processing, can include a virtual mobile device platformcoupled to a physical mobile device over a network where the virtualmobile device platform includes a processor executing instructions on anon-transitory computer readable medium for implementing a virtualmachine. The virtual machine may execute a virtual mobile deviceassociated with a physical mobile device communicating with the virtualmobile device platform over the network, the virtual mobile deviceincluding a guest operating system (OS) and one or more applicationsexecuting on the guest OS. The guest OS generates a frame of displaydata from an application executing on the guest OS. The virtual mobiledevice may include a video encoder including an input/output (I/O)thread for generating a converted frame by performing color spaceconversion on the frame of display data generated by the guest OS and adisplay encoder for generating an encoded frame by encoding theconverted frame generated by the I/O thread. The I/O thread includes aframe rate governor for governing the rate at which the I/O threadgenerates converted frames based on a detection of duplicative framesgenerated by the guest OS. The encoded frame is sent to the physicalmobile device by the virtual mobile device platform.

In a particular embodiment, the frame generated by the guest OS includesa first frame and a second frame, and the governor compares the firstframe to the second frame to detect duplicative frame. The governor canmaintain an identical frame counter and slow the rate of the I/O threadto a first rate when the identical frame counter reaches a firstthreshold. Additionally, the governor may slow the rate of the I/Othread to a second rate when the identical frame counter reaches asecond threshold.

These, and other, aspects of the disclosure will be better appreciatedand understood when considered in conjunction with the followingdescription and the accompanying drawings. It should be understood,however, that the following description, while indicating variousembodiments of the disclosure and numerous specific details thereof, isgiven by way of illustration and not of limitation. Many substitutions,modifications, additions and/or rearrangements may be made within thescope of the disclosure without departing from the spirit thereof, andthe disclosure includes all such substitutions, modifications, additionsand/or rearrangements.

BRIEF DESCRIPTION OF THE DRAWINGS

The drawings accompanying and forming part of this specification areincluded to depict certain aspects of the disclosure. It should be notedthat the features illustrated in the drawings are not necessarily drawnto scale. A more complete understanding of the disclosure and theadvantages thereof may be acquired by referring to the followingdescription, taken in conjunction with the accompanying drawings inwhich like reference numbers indicate like features and wherein:

FIG. 1 depicts a diagrammatic representation of an example of an overallnetwork environment in which embodiments disclosed herein may beimplemented;

FIG. 2 depicts a diagrammatic representation of an example of a networkarchitecture according to one embodiment;

FIG. 3 depicts a diagrammatic representation of an example of a systemarchitecture according to one embodiment;

FIG. 4 depicts a diagrammatic representation of an example of virtualdevice containment and connections according to one embodiment;

FIG. 5 depicts a diagrammatic representation of an example of a channelbased device mapping architecture according to one embodiment;

FIG. 6 depicts a diagrammatic representation of an example ofvirtualization server software architecture according to one embodiment;

FIG. 7A depicts a diagrammatic representation of an example of anAndroid graphics stack;

FIG. 7B depicts a diagrammatic representation of a viewable display;

FIG. 8 depicts a diagrammatic representation of an example of a systemfor generating remote views according to one embodiment;

FIG. 9 depicts a diagrammatic representation of an example of a systemfor relaying the contents of the virtual display to a client running onthe physical device according to one embodiment;

FIGS. 10A and 10B depict a diagrammatic representation of one embodimentof a portion of an architecture for a virtual mobile platform;

FIG. 11 depicts a flow diagram or one embodiment of a governor for framerate processing;

FIG. 12 depicts a diagrammatic representation of detection ofmacroblocks;

FIG. 13 depicts a diagrammatic representation of one embodiment of azig-zag scan;

FIG. 14 depicts a diagrammatic representation of one embodiment ofre-ordering in a linear array;

FIGS. 15A and 15B depicts a flow diagram for one embodiment of a methodfor detecting energy distribution;

FIG. 16 depicts a diagrammatic representation of one example ofmacroblock comparison;

FIG. 17 depicts a diagrammatic representation of one embodiment ofprocessing along a scanline boundary;

FIGS. 18A, 18B, 19A, 19B, 20A and 20B are flow diagrams illustratingembodiments of methods for a combined copy and compare;

FIG. 21 depicts a diagrammatic representation of one embodiment ofmacroblock comparison; and

FIGS. 22A, 22B and 23 are flow diagrams illustrating embodiments ofmethods for color space conversion of macroblocks.

DETAILED DESCRIPTION

The disclosure and various features and advantageous details thereof areexplained more fully with reference to the exemplary, and thereforenon-limiting, embodiments illustrated in the accompanying drawings anddetailed in the following description. It should be understood, however,that the detailed description and the specific examples, whileindicating the preferred embodiments, are given by way of illustrationonly and not by way of limitation. Descriptions of known programmingtechniques, computer software, hardware, operating platforms andprotocols may be omitted so as not to unnecessarily obscure thedisclosure in detail. Various substitutions, modifications, additionsand/or rearrangements within the spirit and/or scope of the underlyinginventive concept will become apparent to those skilled in the art fromthis disclosure.

As described above, a mobile device lives in a hostile world and, assuch, securing the device itself may not be enough and/or possible.There is a desire to separate a physical device from applications thatrun on the device. Embodiments disclosed herein can remove theapplications and services, even much of the device's operatingenvironment from the hostile environment. Instead, these functions areprovided on protected hardware and software in a data center where theycan be managed, monitored, repaired, and deployed under the care ofinformation technology (IT) experts and administrators.

As illustrated in FIG. 1, embodiments disclosed herein can allow a userof mobile device 110 in network environment 100 to switch between usingpublic network services 130 and using private network services 140. Inparticular, the user may access public network services 130 via publicnetwork 120 such as the Internet over which non-sensitive informationmay be communicated. However, to access private network services 140, avirtualization cloud client application (referred to hereinafter as a“VC client application”) running on mobile device 110 connects to avirtualized device (e.g., virtual device 160A) hosted in virtualizationcloud 150 and brokers access to private network services 140 as well aslocal device functions.

Those skilled in the art will appreciate that local device functions mayvary depending upon the type of mobile device 110. For example, mobiledevice 110 can be a touchscreen smartphone with local device functionssuch as the touch screen, the dialer/phone network, camera, GlobalPositioning System (GPS), keyboard, speakers, microphone, and so on.Other examples of mobile device 110 may include touchscreen tablets andother touch-enabled mobile devices. As will be explained in furtherdetail below, such mobile device functions can be provided byembodiments disclosed herein on protected hardware and software invirtualization cloud 150 without adversely affecting the user'sexperience in interacting with mobile device 110, even if the usertravels frequently from one continent to another.

In some embodiments, multiple virtualized devices may be created for thesame physical device. For example, in FIG. 1, virtual device 160A andvirtual device 160B may be created for mobile device 110. This featureis further described below with reference to FIG. 2.

FIG. 2 depicts a diagrammatic representation of an example of a networkarchitecture according to one embodiment. In this example, system 200may include virtualization cloud 250 communicatively connected tovarious types of mobile devices 210A . . . 210N, 211, and 215. Mobiledevices 210A . . . 210N, 211, and 215 may represent different types ofactual touchscreen devices such as smartphones and tablets. Mobiledevices 210A . . . 210N, 211, and 215 may be owned by the same ordifferent entities (e.g., enterprises, users, etc.). Further, mobiledevices 210A . . . 210N, 211, and 215 may be programmed with differentoperating systems such as iOS, Android, and Windows.

Each of mobile devices 210A . . . 210N, 211, and 215 may have a VCclient application installed, for instance, by an administrator or ITpersonnel of system 200. In one embodiment, a VC client application maybe downloaded from an online device-specific app store.

In one embodiment, a VC client application may comprise software thatbrokers access to mobile devices' physical interfaces (e.g., soft andhard keyboards, touchscreen, GPS, camera, accelerometer, speakers,microphone, phone dialer, etc.) and Virtual Private Network (VPN)software that connects across a public network such as the Internet toservers in a virtualization cloud (e.g., virtualization cloud 150 ofFIG. 1) over encrypted network interfaces. Virtualization cloud 250 maybe an embodiment of virtualization cloud 150 described above withreference to FIG. 1.

Virtualization cloud 250 provides a hosted, networked, applicationenvironment. As a non-limiting example, in one embodiment,virtualization cloud 250 is configured as an Android applicationenvironment. As illustrated in FIG. 2, virtualization cloud 250 maycomprise host servers 255 and management domains 260, 270.

Host servers 255 may host application services. Private network services140 of FIG. 1 may be an embodiment of application services hosted byhost servers 255 of FIG. 2. In one embodiment, a plurality ofapplication services may execute on a collection of servers withextensions to support separation and segmentation of a core server.

Each management domain may comprise a collection of virtualized devices,hosted on one or more server machines. In an Android applicationenvironment, such virtualized devices may be referred to as virtualAndroid devices. From another perspective, a management domain is madeup of a collection of server machines providing services to a largenumber of users. A collection of server machines may host virtualdevices for these users and provide access to the applications andservices via a remote client interface. In some embodiments, amanagement domain may further comprise a private application “store” forhosting installable approved enterprise applications particular to thatmanagement domain. In some embodiments, a user can have access to one ormore “virtual devices” hosted in the management domain, each virtualdevice containing a core set of applications such as an enterpriseaddress book, mail, calendar, web browser, etc. in addition to anypreinstalled enterprise applications.

As FIG. 2 exemplifies, each mobile device (e.g., mobile device 210A,mobile device 211, mobile device 215, etc.) has a connection (via a VCclient application installed thereon) to one or more server machinesthat host their virtual device(s) in a virtualization cloud (e.g.,virtualization cloud 250). As explained below, the applications andtheir data located within a single virtual device are completelyinaccessible to the applications and data in another virtual device. Theapplications are limited to the network services within their managementdomain and thus cannot access the network services provided in othermanagement domains. For example, mobile device 210A may have a firstvirtual device hosted on a first server machine in management domain 260and a second virtual device hosted on a second server machine inmanagement domain 270. However, the applications and their data locatedwithin the first virtual device in management domain 260 are completelyinaccessible to the applications and data within the second virtualdevice in management domain 270.

In some embodiments, for each connection to an application servicehosted in the virtualization cloud, a different instance of the VCclient application is started on the mobile device. For example, a firstVC client instance may be started on mobile device 210A to accessmanagement domain 260 and a second VC client instance may be started onmobile device 210A to access management domain 270. All of theapplications running in a particular management domain for a particularuser will be accessed through the corresponding VC client applicationrunning on the mobile device. Additionally, the VC client application'sremote connection software running in a mobile device does not exposeapplication generated events running natively within the mobile deviceto the applications running in their virtual device(s), unless they arespecific events from the devices brokered by the VC client application.In this way, rather than executing mobile applications in an actualdevice (e.g., mobile device 210A, etc.), the applications are runremotely in a virtualization cloud (e.g., virtualization cloud 250)under the watchful eyes of an enterprise's systems and networkmanagement tools and their administrators, separate from each other andfrom the consumer/Internet applications and data.

Turning now to FIG. 3, which depicts a diagrammatic representation of anexample of a system architecture according to one embodiment. In thisexample, system 300 comprises virtualization cloud 350 communicativelyconnected to private network services 340 and various types of mobiledevices 380.

Mobile devices 380 may operate in a distributed computing environmentand may operate on various types of operating systems. Similar to mobiledevices 110, 210A . . . 210N, 211, 215 described above, each of mobiledevices 380 may have a VC client application installed thereon. Theinstalled VC client application may be device-specific. For example,each of Android tablets 381 may have an Android tablet client, each ofAndroid phones 383 may have an Android phone client, each of iOS iPhones385 may have an iOS iPhone client, each of iOS iPads 387 may have an iOSiPad client, and each of Windows tablets 389 may have a Windows tabletclient.

Private network services 340 may comprise enterprise services forprivate network 345. Non-limiting examples of private network services340 may include IT management 301, enterprise applications 303, intranet305, document storage 307, active directory 309, and email exchange 311.These services are known to those skilled in the art and thus are notfurther described herein.

Virtualization cloud 350 may comprise a plurality of system components,including storage 351, controller 353, virtual device manager 355,notification event service 357, virtual devices 359, and authentication361. These system components may run on a single server machine orseparately on multiple server machines. For the sake of convenience, andnot of limitation, FIG. 3 shows each system component running onmultiple physical servers.

More specifically, virtual device manager 355 (an application thatmanages virtual devices) may send a command to controller 353 to createa virtual device. In one embodiment, controller 353 may implement theOpenStack open source cloud computing fabric controller. OpenStack isknown to those skilled in the art and thus is not further describedherein for the sake of brevity.

In response to the command from virtual device manager 355, controller353 may first select a golden image, and any applications associatedwith the golden image. A golden image refers to a virtual machine thatwas built as a template and that usually contains little, if any, morethan the base operating system. A golden image may also be referred toas a gold image, clone image, master image or base image. To create agolden image, an administrator first sets up the computing environmentexactly the way it is needed and then saves the disk image as a patternfor making more copies. The use of golden images can save time andensure consistency by eliminating the need for repetitive configurationchanges and performance tweaks. This approach can be compared toautomated replication, which requires a configuration management tool tobuild new images on demand. In a self-service provisioning environment,a collection of golden images may be referred to as a golden repository,gold catalog or golden image library.

Using the selected golden image, controller 353 may create virtualdevice instance 359 and associate with it a storage location in storageserver 351. Storage server 351 holds the persisted, physical storage ofeach virtual device created by controller 353. Controller 353 may thenreturn the information on virtual device instance 359 to virtual devicemanager 355.

In some embodiments, each user is assigned one or more virtual devicesin one or more management domains when they are provisioned. Thesevirtual “devices” contain applications, their settings and deviceconfiguration, as well as any data created locally in the device for theuser by any installed applications. The images are maintained in networkstorage servers (e.g., storage servers 351) within the correspondingmanagement domain(s). In some embodiments, as part of this image, theuser is provided an emulated “flash” drive for app storage. The imagescan also be configured to permit access to external enterprise storage.In some embodiments, storage servers may utilize redundant storage toprotect data from failures.

In some embodiments, authentication servers 361 may be configured toprovide authentication and session management services. For example,when a user (via a VC client application running on a mobile device thatthe user is using) attempts to access an enterprise application,authentication server 361 may connect to one or more directory servers(e.g., active directory 309) to authenticate the user's access tovirtual device(s) where the enterprise application can be run and toprovision the user with one or more virtual devices. After the userauthenticates, authentication server 361 may direct virtual devicemanager 355 to locate a device server that will host the user's virtualdevice 359. In some embodiments, it may ensure that virtual device 359is “powered on” as well as initiate the initial session negotiation (viaestablishment of security tokens) between the mobile device running theVC client application and virtual device 359.

Those skilled in the art will appreciate that a virtual “device” is anot really a device—it is a remote execution environment for all of theservices and applications that make up a device. There are (at least)two main classes of device servers, “bare metal” device servers andvirtual machine device servers. There are some functional, deployment,and cost differences between these types and so ultimatelyimplementation and market demand will determine their allocation andavailability.

The bare metal device servers are made up of a large number ofrelatively small processing units similar in performance and scale tothe processing units of actual mobile devices. Each virtual deviceinstance can run on its own physical central processing unit (“CPU”)hardware. In some embodiments, a modified version of the Simple Protocolfor Independent Computing Environments (SPICE) protocol server softwareexecutes directly in the operating system (OS) on each of theseinstances to provide remote access.

SPICE is an open source protocol and implementation developed by Red Hatthat provides remote access to virtual desktops. SPICE has awell-documented protocol that includes the ability to create new“channels” for different remote services. Embodiments extend the SPICEprotocol to provide remote access to virtual devices and to brokeraccess to the sensors of the real (physical) devices.

Virtual machine device servers are server class machines that can befound in the server market today. On the virtual machine device servers,each virtual “device” executes in its own virtual machine on a speciallyconfigured Linux device server. In some embodiments, a device server maybe configured to provide Transport Layer Security (TLS) and VPNencryption, virtual device instrumentation/auditing, integrity checksand anti-virus from virtualization layer, system-side applicationmanagement, learning of ‘normal’ behavior, protocol aware firewall,server-side TPM attestation, SELinux-based virtual device separation,VPN service for applications in the virtual devices, and network proxyfor traffic monitoring. Some of these features are further explainedbelow.

In some embodiments, virtual devices hosting Android (or SecurityEnhancements for Android (SEAndroid)) may be created for each user usingLinux's Kernel-based Virtual Machine (KVM) and Quick EMUlator (QEMU).

KVM refers to a kernel-resident virtual machine infrastructure builtdirectly into Linux as opposed to other virtualization techniques thatrun under Linux as a process. This architecture helps KVM operate veryefficiently within Linux. KVM provides completely separate virtualenvironments for Android devices implementing embodiments disclosedherein. KVM itself does not provide any hardware emulation or remotingcapabilities.

QEMU is a user-space emulator that works with KVM to provide thehardware emulation. While QEMU can provide processor instructionemulation, embodiments may employ it only for emulating hardware for thevirtual device. For example, some embodiments use or provide emulatedhardware for touch screen/display, memory/storage, audio, cameras,sensors, bypass, and networking.

Linux and KVM provide the isolation between each user and theapplications that they run. It is not possible to communicate directlybetween the application components and services in these separatevirtual containers. Thus, each “device”, while sharing physical serverhardware, runs independently and is separate from the others, asdepicted in FIG. 4.

FIG. 4 depicts a diagrammatic representation of an example of virtualdevice containment and connections according to one embodiment. In thisexample, virtualization cloud 400 may comprise management domain 410(Office 1) and management domain 420 (Office 2).

Management domain 410 and management domain 420 may be hosted on deviceservers connected to management network 450 which provides a pluralityof network services such as application management services 451A, 451B,application behavioral monitoring services 453A, 453B, user behavioralbiometric services 455A, 455B, and audit services 457A, 457B.

Management domain 410 may comprise a plurality of virtual devices 459X,459Y, 459Z implemented using OpenStack infrastructure 470A on TrustedPlatform Module (TPM)-based attestation 460A. Each of the plurality ofvirtual devices 459X, 459Y, 459Z may include an agent of managementnetwork 450 (e.g., agents 495X, 495Y, 495Z, respectively). In someembodiments, the agent may be referred to as a mobile device managementand mobile application management (MDM/MAM) agent. In this example,management domain 410 may further comprise VPN service 456A and storageservice 458A.

Management domain 420 may comprise a plurality of virtual devices 429X,429Y, 429Z implemented using OpenStack infrastructure 470B on TPM-basedattestation 460B. Each of the plurality of virtual devices 429X, 429Y,429Z may include an agent of management network 450 (e.g., agents 492X,492Y, 492Z, respectively). In this example, management domain 420 mayfurther comprise MDM server 452, MAM server 454, VPN service 456B, andstorage service 458B.

As illustrated in FIG. 4, each of the plurality of virtual devices 459X,459Y, 459Z in management domain 410 and each of the plurality of virtualdevices 429X, 429Y, 429Z in management domain 420 has a read onlypartition and its own KVM/QEMU in a particular SELinux domain (e.g.,read only partition 475X and KVM/QEMU 473X in SELinux domain 471X, readonly partition 475Y and KVM/QEMU 473Y in SELinux domain 471Y, read onlypartition 475Z and KVM/QEMU 473Z in SELinux domain 471Z, read onlypartition 476X and KVM/QEMU 474X in SELinux domain 472X, read onlypartition 476Y and KVM/QEMU 474Y in SELinux domain 472Y, read onlypartition 476Z and KVM/QEMU 474Z in SELinux domain 472Z).

In the example of FIG. 4, the virtual devices are implemented asSEAndroid virtual devices. SEAndroid may provide benefits such asprivileged daemon protection, application isolation, middlewarecontrols, instrumentation and auditing, application install protection,limit application access to sensors, ‘untrusted’ application sandboxing,read-only core OS partition, centralized patching, and MDM/MAM controls.

In some embodiments, virtual devices can be migrated between deviceservers by administrative commands (via management network 450), usingtools to automate the balancing of load across multiple device serversor based on geographical location.

Each of these virtual devices may be connected to a physical mobiledevice (e.g., smartphone 430, tablet 440, etc.). In some embodiments, aVC client application running on the physical device may be configuredto provide remote two factor authentication, remote signing anddecryption, TLS encryption for data in transit, GPS-based accesspolicies, attributes exposed for MDM integration, mechanisms to improveattestation, and/or integration with the mobile device's Mobile TrustedModule (MTM).

When a user is added to a management domain, they are provisioned with avirtual device of a particular type. Similarly, when a user is removed,their virtual devices must be deactivated and their “parts” archived orreclaimed. A separate management server is used by administrators tomanage the lifecycle of devices and users of a virtualization cloud(e.g., virtualization cloud 150, virtualization cloud 250,virtualization cloud 350, virtualization cloud 400, etc., collectivelyreferred to hereinafter as the “VC system”). In some embodiments,provisioning services permit administrators to define device “types”(templates) and configurations and assign them to users depending uponthe role or duty.

In some embodiment, the management of the VC system and the virtualdevices can be controlled through a management policy system. Servers,storage, and virtual devices can be associated with hierarchicallyarranged policy containers. Policies and access to components can becontrolled through these containers and their position in the hierarchy.In some embodiment, these policy containers may be referred to as policydomains and can be used to allocate and delegate control to multipleadministration management domains.

For example, consider a hosted VC environment. A hosting partner wishesto support multiple enterprise customers in a single installation. Atthe same time, they would like to delegate much of the management totheir customers. They may choose to create a single policy domain thatcontains shared resources such as common virtual device images, commondevice storage, and a shared pool of device servers. For each newcustomer, they create a sub-domain and grant administrative access tothe customers' administrators for their respective sub-domain. Inaddition, they create a policy in the root domain that all resources areaccessible to the sub-domains. The customers' administrators can nowcreate assets (new device image templates, users, administrators,groups) within their own sub-domain. They, in turn, can create their ownsub-domains and assign assets, users, groups, administrators, etc. tothose sub-domains as well as policies to determine how resources can beinherited from the companies' sub-domain.

If one of these customers wants dedicated server resources to run thevirtual devices or to maintain their storage, the hosting partner canadd device server and storage server resources to their sub-domain(s)and thus only their virtual devices will be running or be saved on thoseserver assets. Similarly, those systems might have different networkingcharacteristics that would let them share a VPN connection to theenterprise as opposed to configuring a VPN within each of the virtualdevices.

This organization can also be beneficial to enterprises that need todelegate management functions to different departments within theirenterprise yet want to control and maintain the overall infrastructurecentrally.

When migrating a user between two templates, the VC system can supportintelligent upgrading, including:

-   -   Scheduling specific times for upgrades to occur.    -   Roll back to a previous device template if an error occurs.    -   Partial, incremental upgrade processes across a user population.    -   Detection of whether a user is active on a virtual device before        enacting the upgrade.    -   Graceful shut down of a virtual device for which an upgrade is        being forced.

As a non-limiting example, in some embodiment, a provisioning andmanagement server for the virtual machine device servers described abovecan be built on top of a virtual datacenter management platform such asOVirt, OpenStack, or the like. OVirt and OpenStack are known to thoseskilled in the art and thus are not further described herein. OVirtprovides the underlying data services for managing and accessing virtualmachines. The VC system provides an abstraction interface that hidesmuch of the complexity of the underlying virtual datacenter managementplatform when trying to manage multiple management domains within asingle system. In some embodiments, SPICE may be integrated into thevirtual datacenter management platform, allowing users to connect tovirtual machines through SPICE.

In some embodiments, an administrator might want to allow users toaccess a mobile virtual device without a persist state of the virtualdevice beyond a given user's session. In this case, the virtual devicemay be deleted when the session ends. In some embodiments, the virtualdevice may optionally warn the user that the virtual device is operatingon a kiosk mode when the user logs in, and delete the virtual devicewhen the user logs out. Essentially, the kiosk mode provides a ‘fresh’virtual device based on a specified template each time a user logs in.

In a variant of the kiosk mode, a virtual device can be set tosynchronize certain enterprise data (e.g., recent email) when the userlogs into the kiosk mode device, but the virtual device is still deletedwhen the user logs out. In this way, any new enterprise data is placedback into the enterprise applications that should own each respectivedata type. This allows the user to move between server node clusters(e.g., moving between countries) without concern about moving orsynchronizing virtual device state between the different servers.

The VC system may support additional modes of operation. For instance, apublished app mode may enable an organization to offer specificapplications in remote ‘containers’ to large user populations. Anexample would be a bank using the published app mode to make an onlinebanking application available to its customers, while hosting thatonline banking application in their own data centers on their own lockeddown OS image.

In such a published app mode, the end client application icon can becustomized to enable white labeling. For example, when the user logs in,the published application is already open and in focus. When the userquits the application, the remote connection closes. In someembodiments, the published app mode can be coupled with the kiosk modedescribed above such so that the virtual device does not have a persiststate.

In some embodiments, an organization may wish to provision a virtualdevice (whether a full device, kiosk mode, published app, etc.) to aperson not employed by that organization, and the user need onlydownload a VC client application or add the account to their existing VCclient application on their mobile device(s).

In some embodiments, an organization may wish to provision one or morevirtual devices to one or more employees at a partner organization. Inthis case, the publishing organization can liaise with the consumingorganization to add a VC client application and/or set of authenticationsettings to the consuming organization. One of the advantages of thisapproach is that the publishing organization can leverage the userprovisioning and authentication mechanisms of the consumingorganization. For example, access to the VC client application canbecome a setting in the consuming organization's active directory, andusers in the consuming organization must already have authenticated tothe consuming organization in order to have access to the publishingorganization's applications/virtual devices.

In this scenario, doing two remoting steps would add latency andcomplexity to the VC system. To avoid this, when the user connects tothe publishing organization's virtual device, the VC client applicationon the user's physical device can connect to the publishingorganization's VC servers via a bypass channel in the VC server of theconsuming organization.

As described above, SPICE can create new “channels” for different remoteservices. Different types of data can be communicated between a mobiledevice running a VC client application and a virtual device running inthe VC system via different SPICE channels. These SPICE channels aremapped to virtual input/output channels.

FIG. 5 depicts a diagrammatic representation of an example of channelbased device mapping architecture 500 according to one embodiment. Inthis example, data (e.g., display data, audio data, location data, etc.)may be communicated from a mobile device (e.g., client side 510) viavarious SPICE channels (e.g., main channel 511, display channel 513,audio record channel 515, audio playback channel 517, cloud channel 519,Call Admission Control (CAC)/Signaling Controller (SC) channel 521,etc.) to a server in the VC system (e.g., server side 550). Channelbased device mapping architecture 500 may include a virtual devicemapping module embodied on a non-transitory computer readable medium andconfigured for mapping the incoming data to appropriate virtual devicecomponent (e.g., internal component 551, proprietary video graphicadapter (VGA) 553, etc.) and/or virtual input/output channels 555, eachassociated with a particular virtual driver. This is further describedbelow with reference to FIG. 6.

FIG. 6 depicts a diagrammatic representation of an example ofvirtualization server software architecture according to one embodiment.As a non-limiting example, virtualization server software architecture600 may implement a modified version of Android OS.

As illustrated in FIG. 6, virtualization server software architecture600 may comprise a plurality of software components. At its core is aLinux kernel with specialized core drivers 630 to abstract the hardwarelayer from the application runtimes. Channel data 610 are received intoa virtual device's KVM/QEMU 620, mapped via virtual input/outputchannels 639, and handled by corresponding virtual device drivers (e.g.,display driver 631, universal serial bus (USB) driver 633, disk driver635, binder/inter-process communication (IPC) driver 637, camera driver632, input driver 634, power management 636, and network driver 638,etc.). These “virtual” device drivers replace the drivers for a realdevice and communicate using QEMU and the SPICE protocol with a VCclient application executing on the mobile device for access to the realdevices and the services they provide.

Virtualization server software architecture 600 may further comprise acollection of libraries for accessing data, working with text andgraphics, encryption and communication, and the underlying OS. In thecase of Android OS, each virtual device session includes a fullcomplement of Android's application framework, libraries, runtime, andapplications. However, some kernel-based services provided within avirtual device server are modified. For example, power managementservices are simulated and significantly altered as battery support isnot an issue in a virtual device. User interface (UI) indicators forbatteries and other elements not applicable to the virtual device can bemade to reflect the values of the client device.

As another example, applications running in a virtual device do not usethe local device's W-Fi or data network. Instead, they use the InternetProtocol (IP)-based network services provided by the virtual deviceservers. In some embodiments, an “always-on” network interface may beprovided to the applications. Wi-Fi_33 and data connectivity managementapplications the user may install in the virtual device may have norelevance.

Virtualization server software architecture 600 may include additionalvirtual drivers not shown in FIG. 6. Many of the virtual drivers maycommunicate with a VC client application running on a mobile deviceusing extensions to the SPICE protocol. Some are designed to improveperformance whereas others provide access to features expected in amobile device. Some example virtual drivers are further described below.

Virtual sensors driver—provides access to the remote client's sensordevices such as the GPS, the gyroscope, the accelerometer, a compass,battery level, Wi-Fi_33 signal strength, and 3G/4G signal strength.Other sensor types can be added as needed.

When an application requests access to a sensor such as the GPS, thesensors driver sends a device message that results in a sensor requestbeing sent to the remote client. The remote client application thenmakes a similar request to the physical device and begins forwardingsensor data back to the sensor driver as the sensor produces data. Whenthe application no longer needs the sensor information, a close requestis sent back to the client where it then stops monitoring the specifiedsensor.

Some sensors, such as the GPS, can draw significant battery power whilerunning. To prevent unnecessary battery drain, the VC client applicationrunning on the physical mobile device can request that the GPS on thelocal mobile device be turned on or off based on the requirements ofapplications running on the virtual device in the VC system.

Some sensors such as the accelerometer may change values veryfrequently. The VC client application can be configured to sample andrelay accelerometer values from the local physical device based onattributes and requirements of the app running on the virtual device inthe VC system as well as the performance of the network connectionbetween the local and virtual devices (higher network latency and loweravailable bandwidth result in fewer sensor values being communicated).

A specific example of this is in how the VC system synchronizes theorientation of the remote virtual device to the orientation of the localdevice by continually monitoring and relaying orientation change eventson the accelerometer on the local device, while not relaying every minorrotation of the device all the time even if the application on theremote virtual device is not monitoring the accelerometer data.

Additional sensors that the VC system can remote from the local deviceto the virtual device may include the network type, network signalstrength, battery charge remaining, light sensor (used for screendiming), Bluetooth, peripheral device connectivity and the state of anylocal payment credential.

Virtual touchscreen driver—supports remoting of multi-touch actions andalso gestures. Multi-touch gestures can be used for zooming, rotatingand other similar operations. In one embodiment, the SPICE mouse channelmay be modified for this purpose. In some embodiments, a designatedchannel is used for this purpose.

Audio and video bypass driver—improves the performance of audio andvideo processing for both the VC server and the VC client. Whileembodiments can work without bypass, there is a CPU cost on both theclient and the server when using the internal video processing of thehost operating system (e.g., Android). To this end, modified mediaframework 645 is provided to replace audio and video players that camewith the OS with special players that implement the bypass functions.For example, when an application requests to play a video using theAndroid video player (either full-screen or embedded), the bypass videoplayer captures either the video data or an Universal Resource Locator(URL) that points to an address where the actual video file resides, andpasses it via the bypass driver to the remote client. The client thenspawns a local video player and plays the video stream. In the case ofnetwork video sources, the entire stream can be handled outside of thevirtual device via a network proxy.

Audio bypass works much like video bypass. The audio player is replacedto provide proxy access to audio data in the client.

Virtual camera driver—remotes a camera using a combination of a virtualcamera device driver and modifications to the camera functions in themedia framework. When the camera activity or fragment is loaded in thevirtual device, the modified camera viewer and virtual camera driversends a request to the client to bring up the camera. Once a picture istaken, the picture or video is sent to the virtual device server whereit can be placed in the flash storage of the virtual device or can bedelivered to an anti-virus scanner and then placed in enterprisestorage.

Virtual display driver—optimizes delivery of graphics to a remoteclient. More specifically, the graphics layer can be instrumented togenerate messages via a virtual display driver instead of writingdirectly to a frame buffer. In some embodiments, surface manager 641 inlibraries 640 is implemented to handle partial updates to the Androiddisplay. In some embodiments, surface manager 641 may work inconjunction with graphics API 643 to provide acceleration for variouscommands issued by applications and the Android OS.

These and other virtual drivers support remote access for applications660 running on application frameworks 650 in the virtual device.Operation of the virtual device, including processes associated withapplications 660, as well as user behaviors can be monitored via variouscomponents in application frameworks 650 (e.g., resource manager 651,location manger 653, agent 655, notification manager 657, activitymanager 659, content providers 661, telephony manager 663, packagemanager 665, window manager 667, system view 669, Extensible Messagingand Presence Protocol (XMPP) communications service 671, etc.), some ofwhich will be further described below.

As described above, a physical mobile device is separated fromapplications, which are run on protected hardware and software in a datacenter where they can be managed, monitored, repaired, and deployedunder the care of information technology (IT) experts andadministrators. As such, visual displays generated by applications runon remote hardware are displayed on the physical mobile device. Onechallenge in such a system is providing the remote views on the physicaldevice with as little delay as possible. Described below are techniquesfor providing visual displays in an efficient manner.

To better understand the techniques described below, it is helpful tounderstand how graphics are displayed on a typical mobile device. Forthe purposes of this description, an exemplary Android environment willbe used to describe how graphics can be displayed on a mobile device.Other examples and environments are also possible, as one skilled in theart will understand. For example, the techniques and embodimentsdescribed herein may be utilized in association with iOS, Windows, orthe like. Generally, Android applications convert events into visualdisplays. The Android applications consume events that describe changesin the environment (e.g., GPS) and user actions (e.g., screen touches).After processing these events, apps construct visual displays that(presumably) reflect the meaning of the events.

In some embodiments, a system collects events from a local, physicalmobile device, transmits information relating to the events to a distantvirtual Android device, and returns the resulting visual display to thelocal physical device. Such a system presents challenges with issuessuch as network latency and bandwidth consumption. Described below aretechniques that overcome the challenges presented by issues such asnetwork latency and bandwidth consumption. The techniques describedimprove the responsiveness of remote applications, as well as reducevideo bandwidth consumption.

FIG. 7A is a diagram illustrating the structure of an exemplary Androidgraphics stack in a typical Android application. Generally, the Androidsystem constructs a display through two composition operations,described below. FIG. 7A illustrates functions that happen within anAndroid application and within the Android system software, asillustrated by the brackets at the far right of FIG. 7A.

Each view in an application's display represents a different way ofvisualizing the application's internal state. It follows, then, thateach view requires unique logic to draw and maintain it. When anapplication starts, one task is to register with the Android system toreceive events. As discussed above, events describe changes to theenvironment or the user. The events may cause the application to changeone or more of its views to reflect the arrival of the event. That is,when a software object receives an event, it may change the way it drawsinto its view.

FIG. 7A shows events arriving at a plurality of software objects (710A,710B, 710C) of an application. The events may come from any source, suchas sensors, a touch screen, etc. Each software object processes thereceived events according to the needs of the application. A givenapplication's visual content is typically divided into several sub-areascalled “views.” The software objects (710A, 710B, 710C) in FIG. 7A drawthe views (712A, 712B, 712C), respectively. Each visible application inthe Android system has at least one “surface” on which it may draw. Withassistance from the Android system, the application composes the views(712A, 712B, 712C) within the application's surface(s), typically(though not necessarily) using objects called “layouts.” FIG. 7A showsthe resulting surfaces (714A, 714B, 714C). The Android system softwarealso has surfaces it manages for itself, for example, a “navigation bar”surface 714D and a “status bar” surface 714E.

At the bottom of the Android graphics stack illustrated in FIG. 7A, theAndroid system composes these surfaces into the display 716 the usersees. FIG. 7B shows the visible result of the display, which is what isviewable by a user. The visible result of the display includescomponents 720A, 720B, 720C, 720D, and 720E originating from thesurfaces 714A, 714B, 714C, 714D, and 714E, respectively. When the userswitches to a different application, the Android system will composethat application's visual content into the display.

In the virtual mobile device platform described above, the visualdisplay on the physical mobile device is remote from the virtual deviceand the virtual device's virtual display. As mentioned, the systemcollects events from the local, physical device; transmits them to adistant, virtual Android device; and returns the resulting visualdisplay to the local, physical device. FIG. 8 is a block diagramillustrating a system for implementing a process of generating remoteviews.

FIG. 8 is divided by a dashed line to illustrate the separation betweenthe virtual device and the remote physical device. As shown, eventsarrive at the physical device, and are provided to the clientapplication 830. The events are relayed to the virtual device via eventrelay 832. For clarity, FIG. 8 (as well as FIGS. 9, 11, and 12)illustrates an event relay output feeding into the Android graphicsstack. However, note that the event relay actually feeds into anapplication, and thence to the graphics stack. Typically, when anapplication is running, it incorporates parts of the graphics stack intoitself. The events are then processed by the Android graphics stack 834,similar to that described above with respect to FIG. 7A. The resultingdisplay is provided to a video encoder and relay 836. The video encoderand relay encodes and compresses the display information, and returns itback to the client application 830. At the physical device, the clientapplication 830 decodes and decompresses the display information andprovides the information to the mobile device graphic stack 838, whichthen provides display information to the mobile device display 840.

FIG. 9 is a diagram illustrating a more detailed example of the systemshown in FIG. 8. As before, events arrive at the physical device, andare provided to the client application 930. The events are relayed tothe virtual device via event relay 932. The events are then processed bythe Android graphics stack 934. The various components of the graphicsstack 934 are similar to the components illustrated in FIG. 7A. Theresulting display is provided to a video encoder and relay 936. Thevideo encoder and relay 936 encodes and compresses the displayinformation, and sends it to the client application 930. At the physicaldevice, the client application 930 decodes and decompresses the displayinformation and provides the information to the mobile device graphicstack 938, which generates display 940, which is displayed on thedisplay of the physical device.

The system illustrated in FIGS. 8 and 9 has various advantages anddisadvantages. First, it always works—in the sense that the physicaldisplay will always correctly reflect the content of the remote, virtualdisplay. One challenge with the system shown in FIGS. 8 and 9, however,is that the system is bandwidth intensive. Even with the benefit of datacompression, transmitting the entire display to the client consumes alot of bandwidth. The bandwidth requirement may, in turn, introducenetwork latency. On a physical device in a user's hand, there isessentially zero latency between when the user generates events and whena locally installed application receives them. Similarly, there'sessentially zero latency between when a locally installed applicationproduces new visual content and the display offers it to the user. Themechanism illustrated in FIGS. 8-9 may introduce substantial latency atboth points. It is therefore desirable to use techniques to address thebandwidth and latency issues.

Embodiments as disclosed herein may thus address these bandwidth andlatency concerns at least in part through the use of an encoder thatefficiently compresses the video data. If compressed video istransmitted to the display on the virtual mobile device, one candramatically lower the bandwidth required to synchronize the displays(e.g., between the virtual mobile device platform and the physicalmobile device), as compared to sending raw video data. A goodcompression algorithm will allow embodiments to be much less sensitiveto latency (e.g., network latency). The use of compression may, however,itself introduce some amount of latency due to the time it takes tocompress the video data. Moreover, the use of compression in a virtualmobile device environment may introduce additional computationalrequirements to what may already be a computationally intensiveenvironment. Accordingly, embodiments as disclosed may attempt tooptimize the generation and encoding of video data within the context ofthe virtual mobile device to reduce the computational requirements ofthe generation and encoding of the video data, reduce latency introducedby the compression itself and reduce the network latency introduced bythe transport of poorly encoded video data.

The efficient encoding of display data optimized according toembodiments as disclosed may thus allow for the efficient encoding andtransmission of display data generated at the virtualized mobile deviceto the client application on the physical mobile device includingefficient use of available bandwidth on transmission networks. Theseefficiencies may, in turn, enable a high quality and real-timevirtualized user experience at the physical mobile device, even overwireless network connection (e.g., IP based, cellular or other computerbased wireless networks). Moreover, by optimizing the generation andencoding of the display data, the use of computing resources at avirtual mobile platform may be reduced; enabling computer hardware to bemore efficiently utilized by, for example, enabling more instances of avirtual mobile device platform to be executed on a given set of computerhardware.

It may now be helpful to give an overview of the implementation ofembodiments of an video encoder that may be employed by embodiments asdisclosed herein. Referring to FIGS. 10A and 10B, block diagrams ofarchitectures of embodiments including just such a video encoder isdepicted. It will initially be recalled from the above discussion thatin certain embodiments of a virtual device, the virtual device includesan instance of a guest operating system 1012 (e.g., the Android OS)executing on the KVM/QEMU 1010. Video encoder and relay (referred toalso as just the video encoder) 1036 may operate in the KVM/QEMU 1010 toreceive a display generated by the guest OS (e.g., Android) graphicsstack (also referred to as the Android display system) 1034 running inthe Android guest operating system (OS) 1012 as depicted in FIG. 10A. Inanother embodiment, video encoder 1036 may operate directly on theAndroid guest OS 1012 to receive the display generated by the Androiddisplay system 1034 as depicted in FIG. 10B. The video encoder 1036operates to encode and compresses the display information, and sends itto the client application 1030 on the physical mobile device 1090 whereit may be rendered by the client application 1030 and presented to theuser.

More specifically, the video encoder 1036 may receive a frame of displaydata from the Android display system 1034 of the Android guest OS 1012.This original frame 1032 produced by the display system 1034 andreceived by the video encoder 1036 may include a plurality ofmacroblocks in the Red Green Blue (RGB) color space. The video encoder1036 includes an I/O thread 1014 running pre-processor 1038. The I/Othread 1014 may be part of the display system of the KVM/QEMU 1010 inthe case where the video encoder 1036 is operating in the KVM/QEMU(e.g., as depicted in FIG. 10A). The pre-processor 1038 of the I/Othread 1014 receives the original frame 1032 in the RGB color space and,among other tasks, converts this original frame 1032 to the YUV colorspace and provides converted frame 1042 comprising macroblocks in theYUV color space to the display encoder 1044 running in display encoderthread 1046. This display encoder 1044 may compress or perform otheroperations on the converted frame 1042 (or one or more macroblocksthereof) and send the encoded frame 1048 to the client application 1030on the physical mobile device 1090 over a network (e.g., IP based,cellular, another computer based wireless network or some combination ofnetworks). The display encoder 1044 may, for example, operate accordingto one or more compression standards or modified versions thereof. Forexample, in one embodiment, the display encoder 1044 may implement ablock based encoder such as a encoder operating according to a versionof the H.264 video compression standard. Other types of encoders arepossible and are fully contemplated herein.

To aid in the pre-processing of the original frame 1032 and the encodingof the converted frame 1042, video encoder 1036 may maintain one or moreoriginal previous frames 1052 in a buffer in memory or in anotherstorage location. Additionally, in some embodiments, metadata 1056 forthe previous frame may also be maintained. For example, an original last(or previous) frame 1052 may be the frame immediately previous to theoriginal frame 1032 currently being processed. The original previousframes 1052 may be stored as an RGB color space version of the originalprevious frame, may be stored as a YUV color space version of theoriginal previous frame, or both versions may be stored. Additionally,one or more encoded previous frames 1057 may be maintained by the videoencoder 1036 in a buffer in memory or in another storage location to aidin the encoding of the converted frame 1042. For example, an encodedlast frame 1057 may be a compressed version of the frame immediatelyprevious to the original frame 1032 currently being processed. It willbe understood then, that at some point one or more versions of thecurrent frame (e.g., original frame 1032, converted frame 1042 orencoded frame 1048) may be saved as (e.g., replace) an original lastframe 1052 or an encoded last frame 1057. This saving may entail a copyof one location in memory (e.g., buffer) to another location in memory(e.g., buffer), a shift of pointers to memory location or another typeof memory movement, replacement or referencing.

Bidirectional Display Sync

As may be realized, the frame processing pipeline depicted in FIGS. 10Aand 10B comprising the Android display system 1034 and the video encoder1036, including the I/O thread with the pre-processor 1038 and thedisplay encoder thread 1046 with the display encoder 1044, may becomputationally intensive. In some cases, it may be possible to reduceboth the latency and the computational resources or overhead required toimplement such a pipeline by synchronizing the various components of thepipeline.

In particular, an instance of a guest operating system (e.g., theAndroid OS) may be substantially equivalent to an instance of anoperating system configured to operate directly on a hardware device(e.g., a physical mobile device) and to generate frames of display dataat a rate determined by an application executing on that OS. Such aframe rate may be on the order of, for example, 60 frames per second(FPS). While such a frame rate can usually be accommodated without issuewhen the guest OS is operating directly on a hardware device, such aframe rate may be greater than what is needed or desired in a virtualmobile platform setting such as embodiments described.

In fact, in some cases because of the latency introduced by the use ofthe virtual mobile device platform, the processing involved in encodinga frame for a client device or the need to transmit data over a networkto the client application at the physical mobile device, the frame rateutilized in a virtual mobile device platform may be closer to around15-20 FPS.

In particular, when an operating system (e.g., the guest OS) is runningin a virtualized environment and sending display data to a clientapplication over a network, the “display device” in the virtual mobiledevice platform (e.g., the video encoder) may be an emulated devicerunning on the virtual machine (e.g., KVM/QEMU) or in the Guest OSitself. Thus, the real refresh frequency is determined by the operationof the emulated display device (e.g., video encoder) and may be bothdynamic and heavily dependent on processing speed and transmission timeof the display data. Thus, to conserve computational resources, it isdesired to synchronize components of the display pipeline in the virtualmobile device platform to this dynamic real refresh rate. It is alsodesirable to avoid unnecessary display processing. In a virtual mobiledevice platform accomplishing these desires is not straightforward,however, at least due to the many virtual (or non-virtual) displaycomponents utilized in the virtual mobile device platform that aredesigned and configured to be utilized with physical hardware.

To accomplish these desires then, among others, embodiments as disclosedherein, may be synchronize the frame rate of the guest OS (or displaysystem of the guest OS) to the refresh rate of the video encoder bysynchronizing the generation of a frame by the display system of theguest OS to the transmission of an encoded frame by the display encoder.Specifically, in certain embodiments, this may accomplished byconfiguring the output or transmission of an encoded frame to triggerthe generation of a frame by the display system and blocking theexecution of display generation on the guest OS until the generation ofthe frame is triggered. In this manner, computational resources consumedby the execution of the display processing thread of a guest OS may bereduced.

Additionally, a display handling thread of the hypervisor (e.g.,KVM/QEMU) may be synchronized to the generation of an original frame bythe display system of the guest OS. In particular, a display handlingthread of KVM/QEMU may be blocked until a new frame is generated bydisplay system of the guest OS. In this manner, computational resourcesconsumed by the execution of the display processing thread of at leastsome components of the hypervisor display thread may be conserved untilsuch a time as they are needed to process an original frame.

To illustrate in more detail with reference to FIGS. 10A and 10B,display system 1034 may have a display control process 1070 thatcontrols the generation of original frames 1032 by the display system1034 based on display data provided by various applications executing onthe Guest OS (e.g., Android OS) 1012. In other words, at some interval,the display control process 1070 may process a set of display dataprovided by applications to generate an original frame 1032. In somecases, this display control process 1070 may be configured to operate ata certain interval based on one or more variables to achieve a certainframe rate. This capability may be referred to, for example as VSYNC.This VSYNC configuration may allow the display control process 1070 tooperate according to a timer, such that the display control process 1070blocks until the expiration of the VSYNC timer at which point theprocess 1070 unblocks, process the current display data from theapplications and generates another original frame 1032. The VSYNC timeris then reset and the display control process 1070 once again blocksuntil the expiration of the timer.

While blocking on a timer may be useful in certain instances, in otherembodiments, as the frame rate (e.g., rate of generation of frames) ofdisplay encoder 1044 may be dynamic, it may be desirable to block thedisplay control process 1070 based on the output of the display encoder1044 itself. Accordingly, in some embodiments, the display controlprocess 1070 may include a display control synchronizer 1072 responsiveto the output of the display encoder 1044 and configured to block thedisplay control process. In other words, when display encoder 1044generates an encoded frame 1048, the generation of this encoded frame(e.g., the writing of the encoded frame 1048 into a buffer or memorylocation) may signal the display control synchronizer 1072 which, inturn, may unblock the display control process 1070.

The display control process 1070 may then process display data fromapplications by invoking or signaling Android display system 1034 suchthat a new original frame 1032 is generated, at which point the displaycontrol process 1070 may once again block until such a time as displaycontrol synchronizer 1072 is signaled that a new encoded frame 1048 hasbeen generated. In this manner, the Android display system 1034 and thevideo encoder 1036 may serve to operate, respectively as a synchronizedproducer and consumer, with the Android display system 1034 configuredto produce a new original frame 1032 only when the video encoder 1036 isready to process an original frame 1032, thus avoiding the use ofcomputational resources for executing Android display system 1034 toproduce original frames 1034 that might just be dropped or discarded bythe video encoder 1036.

In one embodiment, the display control synchronizer 1072 may beimplemented as a mutex or semaphore or similar variable that is signaledbased on the output of the display encoder 1044. For example, thegeneration of an original frame 1032 may cause the semaphore or mutex to“lock” such that the display control process 1070 is blocked (e.g., doesnot execute or does not cause Android display system 1034 to execute).Similarly, the generation of an encoded frame 1048 may cause theKVM/QEMU 1010 or Guest OS 1012 to signal or “unlock” the semaphore ormutex causing the display control process 1070 to unblock and beginexecuting; in turn causing Android display system 1034 to generate a neworiginal frame 1032. To the Android OS 1012 the display controlsynchronizer 1072 may appear as a hardware interrupt or the like issuedfrom hardware on which it is executing.

Similarly to using the generation of an encoded frame 1048 to unblockthe display control process 1070 of the guest OS 1012, the production ofan original frame 1032 by the Android display system 1034 may be used asa trigger to unblock a display processing thread of the KVM/QEMU 1010(or of the guest OS 1012). As has been noted, most hypervisors,including KVM/QEMU have been designed to emulate hardware. Accordingly,the display processing threads of the KVM/QEMU, including for examplethe I/O thread, have been configured to operate according to frame ratesassociated with such hardware. So, for example, the I/O thread 1014 mayserve as a display processing thread for the KVM/QEMU 1010 and mayoperate to check and attempt to process original frames 1032 at a rateof 60 FPS by calling pre-processor 1038. As discussed, however, in manyinstances the display pipeline of embodiments may operate at a lowerframe rate, such as 15-20 FPS. Thus, if display encoder 1044 is busyprocessing another frame, the output of the pre-processor 1038 (e.g.,converted frame 1042) may get dropped or discarded. Additionally, insome embodiments, the guest OS (or display component of the guest OSsuch as Android Display system 1034) will be configured to only generatean original frame 1032 when an encoded frame 1048 is generated by thedisplay encoder 1044. It would thus conserve computer resources if atleast portions of the display processing thread (e.g., I/O thread 1014)of the KVM/QEMU 1010 (or the guest OS 1012) could be blocked until sucha time that an original frame 1032 is produced by the guest OS 1012 (orAndroid display system 1034). In this manner, the display processingthread of the KVM/QEMU 1010 (or guest OS 1012) would be synchronizedwith the output of the Android display system 1034 and unnecessaryprocessing by this thread could be avoided.

Here, the I/O thread 1014 may serve as a display processing thread forthe KVM/QEMU 1010. The I/O thread 1014 may handle a variety of differenttasks, but may be configured to call the pre-processor 1038 only when anoriginal frame 1032 is produced by the Android display system 1034. Inother words, the trigger for a call by the I/O thread 1014 to thepre-processor 1038 may not be based on a timer, but instead triggered bythe generation of an original frame 1032. In particular, virtual displaydriver 1075 may be included in the display control process thread 1070.When an original frame 1032 is generated by the Android display system1034 (e.g., and before the display control process 1070 blocks based ondisplay control synchronizer 1032) the virtual display driver 1075 maysignal or notify I/O thread synchronizer 1076 in I/O thread 1014. Thesignal or notification may be based on, for example, the writing of theoriginal frame 1032 into a buffer or other memory location designated tohold such an original frame 1032. The I/O synchronizer 1076 unblocks theI/O thread 1014 from calling the pre-processor 1038. When thepre-processor 1038 generates a converted frame 1042 based on thisoriginal frame 1042 the I/O synchronizer 1076 may again serve to blockthe I/O thread 1014 from calling the pre-processor 1038 until it is onceagain notified by the virtual display driver 1075 that an original frame1032 has been generated by the Android display system 1034. In thismanner, unnecessary calls to, and processing by, pre-processor 1034 maybe avoided and computational resources further conserved.

Frame Rate Governor

From the above description it can be understood that embodiments asdisclosed may utilize an I/O thread 1014 that unblocks and process a neworiginal frame 1032 only when such a frame has been generated by Androiddisplay system 1034. In some cases, however, the entire original frame1032 may be a duplicate of the previously generated frame. Thus, it maybe desirable to avoid performing substantially any processing on suchduplicate frames, as there is no need to update the display at thephysical mobile device 1090 based on the frame generated by the displaysystem 1034 of the corresponding virtual device (as there has been nochange in the display).

In particular, a display system (e.g., Android display system 1034) foran operating system (e.g., guest OS 1012) provides methods or APIs forapplications to trigger the to update the display with the latestcontent. Sometimes, applications executing on the OS (e.g., guest OS1012) are “smart” and only update the screen if the application hasactually made a change to the screen data. In other cases applicationscan be “dumb” and update or refresh the display data continuously withthe same data. This smart or dumb behavior may occur, for example, basedon the target content of browser based applications.

In a non-virtualized system (e.g., when applications are executing on anoperating system running on a physical mobile device), the penalty for“dumb” applications is relatively low. There may be slightly higherpower usage on the physical device for this extra processing. On avirtualized system (such as embodiments of a virtualized mobile deviceplatform as disclosed herein), however, computational resources arescarce and thus the penalty for “dumb” type of display updating byapplications can be high in terms of computer resources.

Accordingly, embodiments may include mechanisms to account for such“dumb” applications by reducing the computational resources devoted tothe processing of repeatedly generated identical frames by anapplication. In particular, a governor may be employed that limits thatrate of frame processing (e.g., the frame rate) of one or more processesin the video encoder 1036. Specifically, in one embodiment, a governormay be employed in the pre-processor 1038 of the video encoder 1036 thatcan detect the repetition of frames (e.g., original frames 1032) outputfrom the Android display system 1034. Based on the detection of one ormore repeated frames the frame processing frequency may be retarded. Theslowing of the frame processing frequency may be accomplished in a setof graduated stages or states which may be transitioned between based ona number of repeated frames detected, the number of repeated framesdetected in a row, the number of repeated frames detected within aparticular time period, or a wide variety of other criteria.

To illustrate, for example, typically pre-processing thread 1038 mayprocess original frames 1032 in a normal manner as they are produced byAndroid display system 1034. Frame rate governor 1085 may be configuredto compare an original frame 1032 to an original last frame 1052 todetermine if those frames are substantially identical. Based on thedetection, the governor 1085 may increment a counter tracking the numberof identical frames received in a row. In one embodiment, the countermay only be incremented if the identical frame is received within acertain time period of the identical last frame. Once the counterreaches a first threshold the frame rate governor 1085 may transitionfrom an initial or normal state (e.g., state 0) of frame processing to afirst stage or state (e.g., state 1). This first threshold may be, inone embodiment, three; such that if three identical frames in a row arereceived the first state may be entered.

In this first stage, a frame processing frequency may be retarded by,for example, setting a frame rate processing frequency to one frame per100 ms. In other words, other pre-processing functionality such as ZMVdetection 1074, content type detector 1016, color space converter 1018or other pre-processing activities, may only occur for one frame in a100 ms time period. During this first stage, governor 1085 may keepcomparing received original frames 1032 to original last frame 1052 (andstoring the received original frame 1032 to original last frame 1052).If a frame 1032 that is not identical to an original last frame 1052 isreceived, the governor 1085 may reset the counter (e.g., reset thegovernor to state 0) and remove any minimum or maximum periods on frameprocessing (e.g., processing of original frames 1032 will go back to anormal or configured frame rate, or may occur every time based on atrigger caused by the output of an original frame 1032). Additionally, atimer may be set when the first state is entered such that at theexpiration of the timer the frame rate governor 1085 may transition backto the initial state (e.g., state 0) or may transition back to theinitial state if the identical frame counter is below a threshold or ifone or more other criteria or met.

If, however, identical frames continue to be received at the governor1085 during the first state (e.g., state 1) the counter may continue tobe incremented. If the identical frame counter surpasses a secondthreshold, the frame rate governor 1085 may transition to a second state(e.g., state 2) whereby frame processing may be further throttled. Thisthreshold may be, in one embodiment, five, such that if five identicalframes in a row are received the second state may be entered. In thissecond stage a frame processing frequency may be throttled by setting aframe rate processing frequency to, for example, one frame per 500 ms.Again, during this second stage, governor 1085 may keep comparingreceived original frames 1032 to original last frame 1052 (and storingthe received original frame 1032 to original last frame 1052).Additionally, a timer may be set when the second state is entered suchthat at the expiration of the timer the frame rate governor 1085 maytransition back to the initial state (e.g., state 0), the first state(e.g., state 1) or the initial state, if the identical frame counter isbelow a threshold or if one or more other criteria or met. As will berealized, the number of stages and the maximum frame processingfrequency of each stage is configurable.

FIG. 11 is a flow diagram illustrating one embodiment of a method forimplementing a governor for intelligent frame rate restriction in avirtual mobile device platform. Other embodiments are possible and arefully contemplated herein. As discussed a governor may operate in anumber of different states, an initial state (e.g., state 0) wheregenerated frames are processed and one or more throttled states. Thus,the governor may maintain a table or other association between states(e.g., state identifiers), state delays for those states (e.g., 100 ms,300 ms, 500 ms, etc.) and thresholds indicating the number of duplicateframes to trigger a transition to that state.

At some point then, a new frame may be received, or an event signalingthat a new original frame has been generated by the guest OS may bereceived (STEP 1110). A flag (e.g., “Ignore Screen Processing” flag) maybe checked to see if it is set (STEP 1115). If this flag is set (YESbranch of STEP 1115) no more processing may be done on the newlygenerated frame (STEP 1170). The newly generated frame can then bestored as the original last frame.

If, however the flag is not set (e.g., is “clear) (NO branch of STEP1115), the current (newly generated) frame can be compared to theoriginal last frame to determine if the frames are duplicative (STEPS1120, 1125). If the current frame is not duplicative of the last frame(NO branch of STEP 1125) a duplicate frame counter (e.g., “dup_counter”)may be reset (e.g., to zero) (STEP 1130) and the current frame may beprocessed (e.g., through the video encoder) (STEP 1175). If, however,the current frame is substantially identical to the last frame (YESbranch of STEP 1125) a frame counter (e.g., dup_counter) expiration timemay be checked (STEP 1135) to determine if the current frame wasreceived within some expiration time (e.g., one second or the like) ofthe identical last frame. If the current frame was received within thisexpiration time period (YES branch of STEP 1135) the frame counter(e.g., dup_counter) may be incremented by one (from its current value)(STEP 1140). If more than this expiration time period has elapsed (NObranch of STEP 1135), the frame counter (e.g., dup_counter) may be setto one (STEP 1145).

Based on the value of the frame counter (e.g., dup_counter) it can bedetermined (STEP 1150) if a transition should be made to a different ornext state. Here, for example, the frame counter (e.g., dup_counter) canbe compared to a state entry threshold to determine if new state shouldbe entered. In particular, this determination may be made based on thecurrent state of the governor (e.g., as indicated by a “state number”identifying the current state of the governor). The current state may beused to identify a next state (if any) that may be entered. Asdiscussed, the governor may be configured with one or more transitionthresholds, each transition threshold associated with a state and acount for the frame counter, such that if the frame counter meets orexceeds that threshold the governor should enter that state. Thus, basedon the current state (e.g., state 0, state 1, state 2, etc.) a thresholdfor the number of duplicate frames (as reflected by the frame counter)associated with a next or subsequent state may be determined.

This threshold may be compared with the current value of the framecounter (e.g., dup_counter) to determine if the transition should bemade to the next state. If the current value of the frame counter (e.g.,dup_counter) does not exceed (or meet or exceed) the threshold for thenext state (NO branch of STEP 1150), it can be determined if the currentstate (as reflected in the current state maintained in the “statenumber”) is anything other than the initial state (e.g., state “0”)(STEP 1160). If the current state is the initial state (NO branch ofSTEP 1160) the current frame may be processed (e.g., through the videoencoder) (STEP 1175).

If the current state is anything other than the initial state (YESbranch of STEP 1160) the ignore processing flag (e.g., “Ignore ScreenProcessing” flag) may be set along with a reset timer to reset the flag(STEP 1165). The value for this timer may be set based on the currentstate of the governor (e.g., as indicated by a “state number”identifying the current state of the governor). As discussed, thegovernor may maintain a configuration associating each state (other thanthe initial state) with a state threshold and a state delay time. Thistable can thus be used to determine that state delay for the currentstate (E.g., 100 ms, 500 ms, etc.) and this state delay added to thecurrent (system) time to set the timer for resetting the ignoreprocessing flag, such at the time reflected by the reset timer theignore processing flag will be cleared (e.g., unless the reset timer ischanged, cleared, etc.). As the ignore processing flag is set no moreprocessing may be done on the newly generated frame (STEP 1170). Thenewly generated frame can then be stored as the original last frame.

If the current value of the frame counter (e.g., dup_counter) exceedsthe threshold for the next state (YES branch of STEP 1150), the framecounter (e.g., dup_counter) may be incremented (STEP 1155). The ignoreprocessing flag (e.g., “Ignore Screen Processing” flag) may be set alongwith a reset timer to reset the flag (STEP 1165). The value for thistimer may be set based on the current state of the governor (e.g., asindicated by a “state number” identifying the current state of thegovernor) as discussed above. As the ignore processing flag is set nomore processing may be done on the newly generated frame (STEP 1170).The newly generated frame can then be stored as the original last frame.

Intra Frame Content Type Detection

As discussed, in many cases it is desirable to utilize a display encoderthat operates according to an encoding protocol designed for use withvideo data, such as block based encoders (including, for example,encoders that operate according to the H.264 standard). Thisdesirability may stem at least in part from the ability of certain videoencoders to utilize data from previous frames or to use hardwareacceleration capabilities provided on physical mobile devices. However,entropy or block based encoders of these types may not necessarilyfunction efficiently or achieve good results when utilized on image orgraphic type data, especially such data that includes non-uniform ornon-linear changes. This is the case because from the point of view ofdigital displays consisting of number individuals pixels, the methodsand performance for compressing this data is highly depend on thegradient (rate of change of color and brightness) across the display.These are precisely the types of data that occur with respect to displaydata utilized on a mobile device, as this data often includes vectorgraphics or text having non-linear changes (e.g., to contrast with abackground). Content such as text or vector graphics quickly changesfrom very dark to very bright colors across just a few pixels, whereasimage type data on average changes very gradually in brightness or coloracross just a few pixels. Given this, to achieve better compressionperformance in a virtual mobile device platform, it may be desirable todetect the content type for each frame of display data (or macroblocksthereof) such that different compression may be applied to differenttypes of macroblocks. Embodiments may utilize two methods to detect thedisplay data content type.

In one embodiment, guest OS 1012 may include customized code (e.g., inthe user interface and graphics stack layers) in order to capture highlevel application and system calls that provide display content typeinformation about the areas of the screen they are modifying. Forexample, in Android, an application can use Android's View classes inorder to modify display data. Some view types are also specific tocontent types, such as the ImageView class or the TextView class. Eachview is ultimately rendered to modify a particular portion of thedisplay, so by capturing the view class type, the screen area of theview, or other layout and rendering information in the system, the typeof content for some areas of the screen can be determined. In anotherembodiment, to detect high-frequency macroblocks (e.g., text or vectorgraphics) the frame data itself may be analyzed to differentiate betweenlow-frequency macroblocks with smooth gradients (e.g., image content)and high-frequency macroblocks with sharp gradients (e.g., text andvector graphics).

Once the low-frequency and high-frequency macroblocks have beenidentified an appropriate compression can be applied to differentmacroblocks of different types, where the compression algorithm appliedmay be optimized or perform (relatively) better for that content type.This allows for a compression result that may be smaller in data size,higher in quality (particularly for text and vector graphics) andrequires less processing resources.

Referring to FIGS. 10A and 10B then, as discussed pre-processor 1038 mayinclude a color space converter 1018 to convert the original frame 1032in the RGB color space to the YUV color space of converted frame 1042.The converted frame 1042 in the YUV color space may then be analyzed bycontent type detector 1016 to determine, for each macroblock of theconverted frame 1042 if that macroblock is a high-frequency macroblock(e.g., a macroblock that contains text or vector graphics). Content typedetector 1016 may also determine which macroblocks are low-frequencymacroblocks (e.g., by analysis or by identifying all macroblocks aslow-frequency that are not identified as high-frequency macroblocks).This identification of the type of macroblocks of the converted frame1042 (or an identification of the high-frequency macroblocks) may bestored, for example, in current frame metadata 1058. Thus, current framemetadata 1058 may include an association between each macroblock of thecurrent frame and an identifier indicating if that macroblock is alow-frequency or a high-frequency macroblock.

When the display encoder 1044 is invoked to encode the converted framethen, the application programming interface (API) of the display encoder1044 (which may adhere to, or provide interface substantially similarto, those defined in the H.264 specification or another block basedencoder specification) may be provided with the identities of theidentified high-frequency macroblocks and an instruction not to encodethose identified high-frequency macroblocks. In particular, the metadata1058 for the current frame may include a list or array of the number ofmacroblocks on the screen, with a Boolean value to identify whether ornot each macroblock is of a high-frequency type or not. The H.264encoder API for encoding the frame may have this object type pointer asone of its parameters. Thus, the display encoder 1044 may produce anH.264 frame with the low-frequency macroblocks of the converted frame1042 encoded according to the H.264 standard (or similar) and thehigh-frequency macroblocks in an uncompressed format. Alternatively, thedisplay encoder 1044 may be instructed to encode these high-frequencymacroblocks with modified parameters, such as a smaller step size.

At this point, if the high-frequency macroblocks were not encoded, theencoded frame (e.g., with the low-frequency macroblocks of the convertedframe 1042 encoded according to the H.264 standard (or similar) and thehigh-frequency macroblocks in an uncompressed format) may be passed tohigh-frequency encoder 1059. The high-frequency encoder 1059 obtains theuncompressed data for the high-frequency macroblocks from the encodedframe and applies a separate encoding algorithm (e.g., which may be alossless or lossy encoding algorithm such as zip, etc.) to thesehigh-frequency macroblocks to generate encoded data for thesehigh-frequency macroblocks. Encoded frame 1048 may then be assembled bycombining the encoded low-frequency macroblocks (e.g., encoded accordingto the H.264 standard) and the encoded high-frequency macroblocks (e.g.,encoded according to the separate encoding algorithm (e.g., zip)).

It will be apparent that since these high-frequency macroblocks areencoded according to a separate encoding algorithm the encoded frame1048 may not adhere completely to the H.264 standard. It may, however,be desired to take advantage of specialized hardware or software on thephysical mobile device 1090 designed to process H.264 encoded data.Thus, when the encoded frame 1048 is transmitted to client application1030 it will be wrapped with header information identifying thehigh-frequency macroblocks within the encoded frame 1048. When theencoded frame 1048 arrives at the client application 1030 these encodedhigh-frequency macroblocks may be decoded by the separate encodingalgorithm (e.g., unzipped) implemented by high-frequency decoder 1097and this decoded (or raw) data for this high-frequency macroblockscombined with the encoded low-frequency macroblocks to create a framecompliant with the H.264 specification so that it may be processed bythe video decoder 1094 on the physical mobile device 1090 configured tooperate on H.264 compliant data.

In this manner, computing resources on the virtual mobile device systemmay be conserved as the display encoder 1044 is alleviated of performingcomputationally intensive compression (e.g., entropy steps or the like)of high-frequency macroblocks which, in any event, would not accomplishsatisfactory compression results. Moreover, network latency is reducedas this high-frequency macroblock data can be efficiently compressed bya separate encoding algorithm to reduce the amount of display data thatmay need to be transmitted to the physical mobile device 1090 whilestill taking advantage of any hardware or software on the physicalmobile device configured to accelerate the decoding of H.264 data.

Embodiments will now be explained in more detail. Recall from above thattraditional image and video encoders do not encode text and vectorgraphics in an optimum way. This is the case at least because the basicassumption of most compression techniques is that the highest energy isconcentrated at lowest frequencies (see, e.g., the discussion ofDiscrete Cosine Transform (DCT) Energy Distribution below). This is truefor all practical purposes for video and images as there is a continuumof image data. For text and vector graphics, which involve sharp lines,sharp jumps in frequency components appear, which essentially representhigh frequency components. An attempt to compress this structureefficiently using traditional image and video compression techniqueswould require either using a smaller step size or leaving themuncompressed to allow some of type of compression technique to beapplied to these areas.

The current mechanisms for most image/video processing algorithmsconsider an average energy of the block or some weighting techniques todecide the Qp (quantization step). This works very well for real-lifeimages and videos, but fails to adequately address text content orvector graphics. To optimally (or better) handle vector graphics (e.g.,sharp lines) or text data, a method to identify these types of data isutilized by embodiments. A key question then is how to distinguishbetween a text/vector graphics data and regular image data. In simplestform, these would represent energies in higher side of the spectrumversus lower side of the spectrum.

Based on these observations, in one embodiment the following may beapplied to text or vector graphics. Initially, spectral differences ofupper and lower half of the macroblock in frequency domain may bedetermined. These differences may or determination may be weighted. Ifthe energy of upper half of the spectrally represented macroblock ishigher than lower half of the spectrally represented macroblock by acertain amount (e.g., a delta or threshold difference), then themacroblock is designated as a high-frequency macroblock (e.g., itrepresents text/vector graphics) and is identified as such in currentframe metadata 1058. As discussed, the identified high-frequencymacroblock can then be, for example, encoded using smaller step size invideo encoder or left uncompressed, allowing an additional stage of dataencoding after the video or image encoding step that is better suited toencode high frequency data.

Referring now to FIG. 12, a block diagram illustrating one embodiment ofthe detection of high-frequency macroblocks is depicted. The differentmethodologies involved in this embodiment are computation of DCT,zig-zag scanning and energy distribution. Thus, a DCT is applied tomacroblock 1210 (STEP 1212) resulting in a DCT image 1220 including theDCT coefficients in the frequency domain. Next a zig-zag scan isperformed (STEP 1222) on the DCT image 1220 resulting in an ordered DCTcoefficient array 1230. This ordered DCT coefficient array 1230 can thenbe processed (STEP 1232) to calculate an energy distribution for themacroblock. Based on the energy distribution the macroblock can beidentified as a high-frequency (or low-frequency) macroblock.

DCT Transform

In general, DCT expresses data as the sum of cosine function to reducesize of data. The basic transform, ‘core transform’, is a 4×4 or 8×8integer transform, a scaled approximation to the Discrete CosineTransform, (DCT). In this technique, DCT is applied on 16×16 macroblockdata which ultimately calls 4×4 DCT on 16 sub blocks.

One Embodiment Zig-Zag Operation:

-   -   After the DCT operation, significant DCT coefficients of the        macro block are typically the low frequency coefficients' which        are clustered around the DC (0,0).    -   In the H.264 encoder, prior to entropy coding scanning is        performed on quantized DCT coefficients. The main purpose of the        scan is to re-order coefficients and clustered the non-zero        coefficients together, which enables optimum representation of        the remaining zero-valued quantized coefficients.    -   In a typical frame non-zero coefficients are mostly clustered        towards top-left corner (also known as DC) and zig-zag scan is        performed.    -   In the given example, DCT is applied on 4×4 blocks which        generates the total 16 DCT coefficients and it starts zig-zag        scan from DC coefficient and then AC coefficients in zig-zag        pattern as shown in FIG. 13. The pattern in FIG. 13 illustrates        the order in which the scan may be performed on the blocks.    -   After performing the zig-zag, low frequency coefficients tend to        occupy space near the start of the array (as indicated by the        dotted blocks in FIG. 14) and high frequency coefficients        clustered towards end of array (marked by diagonal lines in FIG.        14). DCT coefficients are re-ordered in the linear array as        shown in FIG. 14. Thus, FIG. 14 illustrates the result 1420 of a        zig-zag scan as illustrated in FIG. 12 as applied to blocks        1410.        DCT Energy Distribution

Energy is used to describe a measure of “information” in an block.Energy Distribution represented gives the contribution of individualtransformed coefficients. The energy of the DCT coefficients may becomputed by the following equation.Energy=Σ_(y=1) ^(y=N)Σ_(x=1) ^(x=M)|coeff(x,y)²|

Where, coeff (x,y) represents the macroblock in the transform domain.

It may be desired to avoid the first DC coefficients in energycalculation as it contains the maximum energy.

N×M is the size of DCT coefficients matrix.

Normalize energy=Energy/Total number of coefficients taken to find theenergy

To normalize the energy distribution divided the total energy with thenumber of coefficients taken in to account for finding energy. To haveweighted of DCT coefficients, some threshold may be taken intoconsideration.

For example, for 4×4 block there are 16 DCT coefficients and ifscan_threshold=0.3, then

Scan_width=0.3*16=4 and normalize energy distribution is calculatedusing following equation.Low_coeff_energy=Σ_(x=0) ^(x=4)|coeff(x,y)|²/scan_widthHigh_coeff_energy=Σ_(x=4) ^(x=16)|coeff(x,y)|²/(block_width−scan_width)

After processing all the blocks, compare the energy distribution. If thelow frequency coefficients have more energy compared to high frequencycoefficients then declare this macroblock as a low-frequency macroblockelse identify it as a high-frequency macroblock.

FIGS. 15A and 15B depict a flow diagram for one embodiment of a methodfor detecting energy distribution in a macroblock. As can be seen, thisembodiment of the method functions as follows: if the macroblock is ofinter type, performs a DCT on the (e.g., 16×16) block. The DCT operateson sub-blocks (e.g., 4×4 sub blocks) and gives the 16 coefficients, fromwhich DC is located at (0,0) positions. The zig-zag scan arranges theDCT coefficients from low frequency to high frequency order (e.g., in alinear array). For example array [0]=DC component, array [1]=low(est)frequency and so on with array[15]=highest frequency coefficients. Thenthe method calculates the energy distribution as described above on eachof the (e.g., 4×4) sub-blocks avoiding the DC component. If the lowfrequency count is more than the high frequency count for that 16×16macroblock, that macroblock is identified as a low frequency macroblockotherwise it is identified as a high frequency block. While embodimentsherein are illustrated with respect to a 16×16 macroblock and 4×4sub-blocks other embodiments will be generally applicable to macroblocksand sub-blocks of other sizes.

Specifically, the method may be called with the array of DCTcoefficients for the current macroblock (STEP 1505). As discussed, theseDCT coefficients may be arranged as (e.g., 4×4) sub-blocks. Initially, ascan threshold may be determined based on the width of the macroblockand a scan factor (e.g., “f_scan_factor”). Height tracking indexvariables (y) and width tracking index variable (x) may also beinitialized along with variables for low frequency count and highfrequency counts. (STEP 1510)

In particular, the f_scan_factor may indicate a proportion of lowfrequency and high frequency coefficients that should be considered inthe calculation of energy distribution. The scan_threshold maps thef_scan_factor as a threshold value for the DCT structure (e.g., lineararray). The scan_threshold indicates how many coefficients should beconsidered from the DCT data structure (e.g., the linear array) tocalculate low frequency coefficients' energy. For example, if asub-block is 4×4 pixels, there may be a total of 16 DCT coefficientsstored in the data structure (e.g., linear array) if DCT coefficientsfor the sub-block. If the f_scan_factor is 0.3 then the scan thresholdmay be 16*0.3=4.8 (which may in some embodiments be rounded down to 4).

It can then be determined if the height tracking variable (y) is lessthan the block height (e.g., the number of pixels of a macroblock in theY direction) (STEP 1514). If the height tracking variable (y) is lessthan the block height (YES branch of STEP 1514) it can then bedetermined if the width tracking variable (x) is less than the scanthreshold. If the width tracking variable (x) is less than the scanthreshold (YES branch of STEP 1516) the low energy coefficients may beused to calculate the low frequency energy (e.g., for the sub-block)(STEP 1518). For example, if the DCT array has 16 coefficientscorresponding to the macroblock, the first number of coefficientsassociated with the scan_threshold (e.g., 4 in the instant example) maybe used to calculate the low frequency energy using the sum of squaresmethod. Thus, while the width tracking variable (x) is less than thescan threshold, the low frequency energy of these coefficients may besummed and the width tracking variable (x) incremented until the widthtracking variable (x) is equal to the scan threshold (NO branch of STEP1516).

At the point the width tracing variable (x) is equal to (or greater)than the scan threshold (NO branch of STEP 1516) it can be determined ifthe width tracking variable (x) is less than the block width (e.g., thewidth of the macroblock) (STEP 1520). While the width tracking variable(x) is less than the block width (YES branch of STEP 1520) the highenergy coefficients may be used to calculate the high frequency energy(STEP 1522). For example, if the DCT array has 16 coefficientscorresponding to the macroblock, the coefficients from 5-16 (e.g., inthe case where the scan_threshold is 4) may be used to calculate thehigh frequency energy using the sum of squares method. Thus, while thewidth tracking variable (x) is less than the block width, the highfrequency energy of these coefficients may be summed and the widthtracking variable (x) incremented until the width tracking variable (x)is equal to the block width (NO branch of STEP 1520).

At the point the width tracking variable (x) is equal to (or greater)than the block threshold (NO branch of STEP 1520), the high frequencyenergy and the low frequency energy may be normalized by dividing by thenumber of coefficients utilized to determine each (e.g., 4 or 12) (STEP1524). The normalized low frequency energy may then be compared to thehigh frequency energy (STEP 1542). If the low frequency energy isgreater than the high frequency energy (YES branch of STEP 1542) the lowfrequency count may be incremented (STEP 1544). Otherwise, (NO branch ofSTEP 1542) the high frequency count may be incremented (STEP 1546). Theheight tracking variable (y) may then be incremented (STEP 1548), thewidth tracking variable (x) reset (e.g., to 0) (STEP 1552) and theheight tracking variable (y) again compared against the block height(STEP 1514) (e.g., to determine if the last sub-block has beenprocessed).

If the height tracking variable (y) is equal to (or greater than) theblock height (NO branch of STEP 1514), the low frequency count may thenbe compared to the high frequency count (STEP 1526). If the lowfrequency count is greater than the high frequency count (YES branch ofSTEP 1526) the macroblock may be designated as a low frequencymacroblock (STEP 1528). Otherwise, (NO branch of STEP 1526) themacroblock may be designated as a high frequency macroblock (STEP 1530).

Combined Copy and Conditionally Compare

Returning now to FIGS. 10A and 10B, another way to conserve computingresources on the virtual mobile device platform may be to avoidprocessing parts of a frame that have not changed from the previousframe. In particular, in some cases original frame 1032 stored in abuffer or memory location may have various areas (e.g., macroblocks)that did not change at all from the previously output frame 1052. Ifareas of the original frame 1032 have not changed from the originalprevious frame 1052 it may be possible to bypass some processing forthese areas of the original frame 1032 when producing encoded frame1048, including for example pre-processing 1038 of the duplicativearea(s) (e.g., processing of the area by content-type detector 1016, orcolor space converter 1018) or encoding of the unchanged area(s) bydisplay encoder 1044. Instead, a corresponding area of the originalprevious frame 1052 (e.g., in the YUV color space) or an encoded versionof the corresponding unchanged area from encoded previous frame 1057 maybe utilized to generate the current encoded frame 1048. By bypassingthis processing, computing resources may be conserved and the generationof encoded frame 1048 accelerated.

Recall from the above that the pixel data of frames is organized intosquare subsections of pixels called macroblocks. For example, a 16×16pixel size is a typical macroblock size for the H.264 codec. When theoriginal frame 1032 is received by pre-processor 1038, ZMV detector 1074may operate on the macroblocks of the original frame to determinemacroblocks of the original frame 1032 that have not changed from theprevious frame 1052. Because the data in these macroblocks do notchange, from a motion detection point of view, macroblocks with onlyunchanged data are referred to as zero motion vector (ZMV) macroblocks.The identified ZMV macroblocks may be identified in current framemetadata 1058. Such an identification may be in addition to, in lieu of,or may include, an association between each macroblock of the currentframe and an identifier indicating if that macroblock is a low-frequencyor a high-frequency macroblock.

In particular, in one embodiment, ZMV macroblocks of the original framemay be determined by including customized code in the guest OS 1012(e.g., in the user interface (UI) and graphics stack layers) in order tocapture high level application and system calls that provide displaycontent type information about the areas of the screen they aremodifying. Unmodified areas found using this method may incur zeroaddition processing cost to find.

Alternatively, a comparison between the current frame 1032 in memory(e.g., a buffer) and the original previous frame 1052 in memory (e.g., abuffer) may be utilized to identify ZMV macroblock of the currentoriginal frame 1032. Certain embodiments may implement such a memorycomparison in an efficient manner. First, while the comparison units are16×16 (or other dimension) pixel macroblocks (e.g., macroblocks arecompared between the current frame 1032 and the original previous frame1052) line by line comparisons may be used to take advantage of typicalhardware cache line sizes and mechanisms. Secondly, hand coded assemblyx86 extension operations available at runtime may also be used.Additionally, remember that since the original last frame 1052 ismaintained, at some point the current original frame 1032 must be copied(e.g., from a frame buffer containing the current original frame 1032 toa location or buffer for storing the original last frame 1052). Bycombining the comparison operation where a macroblock of the currentoriginal frame 1032 is compared with the corresponding macroblock of theoriginal last frame 1052 with a copy operation to replace thecorresponding macroblock of the original last frame 1052 with themacroblock of the current original frame 1032 embodiment may reducecomputational resources and overhead.

Specifically, in certain embodiment an OS running in a virtualizedenvironment (e.g., guest OS 1012) actually has a display device framebuffer whose memory was allocated in the hypervisor (e.g., KVM/QEMU1010) (or on the guest OS 1012). This may be the frame buffer whereoriginal frame 1032 is stored. From the point of view of the hypervisordisplay processing functionality (e.g., I/O thread 1014), this framebuffer memory may be written to by the virtual OS display system (e.g.,Android display system 1034) at any time, and thus is unstable.Accordingly, to reliably make a comparison of the current frame buffercontents (e.g., original frame 1032) to the previous frame buffer thatwas processed, one must keep around an additional frame buffer whichholds the data from the previous frame. Original last frame 1052 may bestored in such a frame buffer. While making an extra copy of the entireframe buffer is typically not computationally cheap, by combining aframe buffer copy step with the compare step (e.g., using embodiments ofassembly code) the cost required to perform such copy and compare may beshared amongst the operations, thus reducing the total cost of doingboth.

Particular embodiments of a macroblock compare-and-conditionally-copyfunctionality which performs the memory comparison between all themacroblocks in the previous and current frame to determine whichmacroblocks contain modified data and which macroblocks remain unchanged(e.g., are ZMV macroblocks) will now be discussed in more detail withreference to FIGS. 16-20. This information (e.g., identifying whichmacroblocks are or are not ZMVs) may be stored in a container called themb_info_hyp array (or, interchangeably the mb_info_array or mb_infoarray) which may be provided to other components (e.g., color spaceconvertor 1018, content type detector 106, display encoder 1044, etc.)in the display pipeline to increase quality and performance of thispipeline. As display encoder 1044 and other components of the pipelinemay be configured to operate on macroblock units of a base pixel size(e.g., 16×16 pixel for H.264), pre-processor 1038 may also operate onframe data organized into macroblock units such that data on thesemacroblock units may be passed to other components (e.g., displayencoder 1044).

As discussed, the macroblock comparison between the previous and currentframe (e.g., to determine ZMV macroblocks) may be extremely useful, asthe identity of the ZMV macroblocks may be used to bypass expensivenon-beneficial or unneeded operations in the rest of the displaypre-processing and the display encoding. In the pre-processing stage, incertain embodiments expensive color space conversion operations may bebypassed for macroblocks identified as unmodified. For the displayencoder, expensive operations at multiple stages of the encodingpipeline may be bypassed and data from the previously encoded frame(e.g., as contained in encoded previous frame 1057) used instead. Itwill be noted that while certain embodiments may bypass color spaceconversion or display encoding based on identification of ZMVmacroblocks according to embodiments of ZMV identification as disclosedherein, the identification of ZMV macroblocks may be done according toalmost any methodology desired in conjunction with embodiments ofbypassing color space conversion or display encoding as disclosedherein.

While block based display encoders (e.g., those used for H.264 basedencoders) typically attempt to re-use data from the previous screen asmuch as possible in their encoding of the current frame these typicalapproaches are less optimal and less accurate than embodiments of themacroblock compare-and-conditionally copy method as disclosed. Toillustrate in more detail, a straightforward and obvious solution forcomparing macroblocks between the current and previous frame would be tocompare each pixel row of the macroblock from top to bottom.

But, this approach is inefficient as illustrated with respect to FIG.16. Suppose for example a frame 1610 is of size 144×192 pixels. Whenthis frame 1610 is organized into macroblocks having a size of 16×16pixels the total number of macroblocks for frame 1610 is 108 (whereMBX=9 and MBY=12 or 9×12). For example, 16×16 pixels of macroblocknumber (8, 12) 1620 are expanded as am array from 0 to 15 pixels in bothdirections (X and Y). The data read pattern is illustrated as shown,starting with pixel (1,1) of the array and proceeding down row 1 untilpixel (1, 15). The scan can then start on row 2 beginning with pixel (2,1), etc. Using a programmatically straightforward approach, thecomparisons go to next scan line after every 16 pixels to load the data.Proceeding in this manner violates or goes against the cache direction(as cache may have stored all the data (for that row)). Hence, scanningin this manner may increase the cache miss ratio. Accordingly, thesetypes of scans may incur significant performance penalty for an overallmacroblock comparison operation.

As illustrated in FIG. 17, a major efficiency gain of embodiments of thecompare and conditionally copy methodology as disclosed is achieved byprocessing data along the display scanline boundary versus the moreobvious boundary of the memory of a particular macroblock. Inparticular, embodiments may compare 16 pixels (16×4=64 bytes), onemacroblock in the X direction. In other words, one macroblock of thecurrent frame may be compared to one macroblock in the previous frame,proceeding in the horizontal (or X) direction. If this comparison findsthat in the first row itself (of 16 pixels) that there is a mismatch inthe data in the previous frame being compared, the comparison may beskipped for the remaining pixels in all rows for that macroblock. Thecomparison can then proceed to the next macroblock in the X direction(if possible) or the first macroblock of the next row (if possible). Inparticular, a pointer being used to point to the current macroblock(s)of the frames being compared may be updated to the next macroblock inthe X direction (if possible) or the first macroblock in the next row(if possible). To get the benefit of cache, macroblocks may be comparedbetween the two frames (current frame and previous frame) in ahorizontal direction only.

Processing data along the display scanline boundary may achieveefficiencies because the data along the display scanline is contiguous,and thus allows for the most efficient access by CPUs due to thestandard memory caching architectures that are optimized for suchcontiguous memory access. While this may be more programmaticallycomplex for the comparison of macroblocks, the operational efficiencygains are large. More specifically, as discussed in embodiments of thevirtual mobile device platform, a virtual mobile OS (e.g., Android OS1012) is running under a hypervisor which provides a virtual displaydevice (e.g., I/O thread 1014) to which the virtual mobile OS displaysystem 1034 renders its RGB display output. The display handlinghypervisor code then processes and compresses this display data to besent out to remotely connected devices 1090, often connecting overbandwidth limited connections.

In certain embodiments, the current rendered frame (e.g., original frame1032) output by the virtual mobile OS (e.g., Android OS 1012) uses asingle frame memory buffer (e.g., to store original frame 1032). Inorder to perform the macroblock comparison, a copy of the previous framebuffer must also be kept (e.g., original last frame 1052). Accordingly,to compare the data from the frames, the copy functionality may beintegrated with the comparison of the frames such that only modifieddata is immediately copied from the current frame buffer to the previousframe buffer memory location for the pixel data that was just compared.This allows for an efficient method of copying this data due to a) onlycopying data that did not change, and b) re-using cached data that wasused for the compare operations. Re-using the cached data avoidsexpensive cache misses that would result in inefficient external memoryfetch operations. These integrated compare-and-conditionally-copyoperations do not exist in standard libraries (e.g., C libraries).

FIGS. 18-20 are flow diagrams illustrating embodiments of methods for acombined copy and compare. These flow diagrams illustrate embodiments ofimplementations for efficient macroblock compare-and-conditionally-copyfunctionality of 32-bit RGB data. These methods are similarly applicableto other formats such as 16-bit or 24-bit or 64-bit RGB data formats(among others). Generally, in certain embodiments an array for themacroblocks (e.g., mb_info_hyp array) is initialized with value 1, andit is assumed that all the macro blocks are unmodified/static. Next, thetotal number of static count is calculated and initialized, as all themacroblocks in the frame. Next, reference and current pointers may beloaded into registers. A main outer loop runs (e.g., index i) on thetotal number of macro blocks in the Y direction. An inside loop is runfor 16 rows (one macroblock size). A last loop (e.g., index j) will berun on the number of macroblock in the X direction. The total number ofmacroblocks may always be processed in the X direction, which is thecache direction and then move to the next scanline. By scanning in the Xdirection the cache miss ratio is reduced and hence the CPU performanceincreased.

Embodiments of the method checks for the mb_info_hyp[j] array and if thevalue is equal to 1, this macroblock is not processed yet so compare the64 bytes (16 pixels*4 bits/pixel) for this scanline of the macroblockwith the reference frame. If the comparison is successful, incrementboth the pointers to the macroblocks being compared, go to the nextmacroblock in the X direction and go until the last macroblock in the Xdirection is processed. If the comparison fails, reset the value inmb_info_hyp[j] array to 0 for this macroblock, decrement the totalstatic count (e.g., number of unchanged macroblocks between the frames)and increment both the pointers to move to the next macroblock scan. Ifthe comparison fails and a calling application has indicated themacroblocks should be copied, copy back the data from current framepointer to the reference pointer for this macroblock position.

If all the macroblocks are processed in X direction, the method moves tothe next row of pixels and again checks the mb_info_hyp array value. Ifthat value of mb_info_hyp is 0, it indicates that first row comparisonfails for this macroblock, so avoid comparison for remaining rows forthis macroblock and move to the next macroblock. If the all the 16 rowsare completed, the method increments the macroblock count in the ydirection (e.g., index i) count and if the macroblock count in the ydirection (e.g., index i) exceeds the total number of macroblocks in theY direction, it returns the total number of static macroblocks to thecaller of the method.

FIGS. 18A and 18B depict an embodiment of a non-platform specificimplementation using standard C library functions. Additionally, FIGS.19A and 19B and 20A and 20B illustrate embodiments that utilized x86platform specific functionality to more efficiently perform theseoperations by using Intel x86 extensions in the SSE2 and AVX2 extensionfamily. SSE2 is support by most x86 machines in operation today and AVX2is support by more recent machines. These embodiments are given by wayof example. Other embodiments could apply to other platforms as well andsimilarly use their platform specific extension in a similar manner. Forpurposes of the following description it is assumed that there are 32bits/pixel RGB data and SSE2 SIMD instructions are being used.Macroblock size is assumed to be 16×16 pixels.

Referring first to FIGS. 18A and 18B then, the macroblock comparison andconditional copy may be called (STEP 1810). The macroblock (mb) array(the mb_info_hyp array) is initialized with value 1, indicating that itis assumed that all the macroblocks of the frames being compared areunmodified/static. Next, the total number of static count is calculated.A variable (e.g., static_mbcount) may be initialized for the framesbeing compared (e.g., total number of macroblocks in the X direction(totalmb_x) multiplied by the total number of macroblocks in the Ydirection (totalmb_y)). The number of static macroblocks between the twoframes being compared may be stored in this variable (e.g.,static_mbcount). A loop counter variable (i) for the total number ofmacro blocks in the Y direction is also initialized (STEP 1812).

The main outer loop runs on the total number of macro blocks in Ydirection, which is determined by comparing the Y direction loop counter(i) to the total number of macroblocks in the Y direction (totalmb_y)(STEP 1814). If the Y direction loop counter (i) is equal or greaterthan the total number of macroblocks in the Y direction (totalmb_y) itindicates that all the macroblocks have been compared, the method mayend and the value of the variable (e.g., static_mbcount) indicating thetotal number of static or unchanged macroblocks for the frames beingcompared may be returned (STEP 1848).

If, however, the Y direction loop counter is less than the total numberof macroblocks in the Y direction (totalmb_y) (YES branch of STEP 1814),the reference pointer (refptr) for pointing to the (macroblock orscanline of) the previous frame buffer and current pointer (currentptr)for pointing to the (macroblock or scanline) the current frame buffermay be loaded into register variables (e.g., A and B respectively).

A variable (e.g., t) for an inside loop counter (e.g., for 16 rows ofscanlines) may also be initialized. The macroblock array (themb_info_hyp array) may be initialized for the current row (e.g., the rowcorresponding to index i) (STEP 1818). The inner loop (indexed by t) maybe run on the total number of scanlines in one macroblock (e.g., 16).Accordingly, it can be checked if this loop counter (t) is less than 16(STEP 1820). If it is not, it indicates the last scanline has beenreached and the method may increment the row counter (i) (STEP 1816) andreturn to see if the last row has been processed (STEP 1814).

If, however, the last scanline has not been reached (YES branch of STEP1820), a loop counter (j) for the number of macroblocks in the Xdirection may be initialized (STEP 1824). This last loop will thus berun on the number of macroblock in the X direction. The total number ofmacroblocks may always be processed in the X direction, which is thecache direction, and then move to the next scanline. By scanning in theX direction the cache miss ratio is reduced and hence the CPUperformance increased.

Accordingly, it can be checked if this X direction macroblock counter(j) is less than the total number of macroblocks in the X direction(totalmb_x) (STEP 1826) (e.g., indicating the last macroblock of the rowhas or has not been reached). If it is not, it indicates the lastmacroblock of the row has been reached and the method may increment thescanline counter (t) (STEP 1822) and return to see if the last scanlinehas been processed (STEP 1820). If, however, the last macroblock has notbeen reached (YES branch of STEP 1826), the value for the entry in themacroblock array (the mb_info_hyp array) corresponding to the macroblockassociated with current value of the macroblock loop counter (j) may bechecked to determine if that macroblock has been processed (or processedand found to be “dirty” (e.g., the macroblock in the current frame isdifferent than the corresponding macroblock in the previous frame)).

Recall that the entries in the macroblock array (the mb_info_hyp array)were initialized to 1 indicating that they are static macroblocks. Thus,if the value of the mb_info_hyp array for the current index(mb_info_hyp[j]) is equal to 1, this macroblock has not been compared(or has been previously compared and found not to be dirty).Accordingly, if the value of the mb_info_hyp array for the current index(mb_info_hyp[j]) is equal to 1 (YES branch of STEP 1830), a scanline ofthe macroblock of the current frame may be compared to a scanline of thecorresponding macroblock of the previous frame (STEP 1832). Thiscomparison may be done, for example, using the memcmp function of thestandard C library called with the current value of the registervariables (e.g., A and B) pointing to the areas of the macroblock to becompared and the number of bytes to be compared (e.g., 64) (STEP 1832).These register values (A and B) can then be incremented by 64 to advancethem to point to the scanline of the next macroblock of the respectiveframes in the X direction (STEP 1834).

If the comparison of the scanline of the macroblocks indicates that theyare equivalent (e.g., the return value from the memcmp operation is 0)(YES branch of STEP 1836), the counter for the macroblocks in the row(j) may be incremented (STEP 1828) and it can then be determined if thatmacroblock was the last macroblock in the row (STEP 1826). If it was thelast macroblock in the row (NO branch of STEP 1826), the inside loopcounter corresponding to the number of scanlines (t) may be incremented(STEP 1812) and it can be determined (STEP 1820) if that is the lastscanline for that row of macroblocks.

If, however, the memcmp indicates that the scanline in the twomacroblocks being compared are not equivalent (NO branch of STEP 1836),the value for that macroblock in the value of the mb_info_hyp array forthe current index (mb_info_hyp[j]) may be set to 0, indicating themacroblock is dirty and the static_mbcount indicating the number ofstatic macroblocks between the two frames decremented (STEP 1844). If avalue is set (e.g., b_cpyflag) indicating that the macroblocks should becopied and the value of the mb_info_hyp array for the current index(mb_info_hyp[j]) is set to zero (YES branch of STEP 1842), the data forthat scanline may be copied from the location pointed to by the currentpointer (e.g., B) to the location pointed to by the reference pointer(e.g., A) (STEP 1840). Additionally, if the indexed scanline for eachmacroblocks in the X direction have been compared, the reference pointerand the current pointer (e.g., A and B) may be incremented by a strideso that they may point to next row of scanlines (STEP 1838). A stridemay be a memory size corresponding to a number of bits utilized to storedata for a row of pixels and may be, for example, dependent on the dataformat or architecture used by a computing device. The index (j) for thefor the X direction macroblock counter may be incremented (STEP 1828)and again it can be checked if this X direction macroblock counter (j)is less than the total number of macroblocks in the X direction(totalmb_x) (STEP 1826) to see if the indexed scanline (e.g., indexed byt) has been compared for each macroblock in the X direction.

If, when evaluating whether the value of the mb_info_hyp array for thecurrent index (mb_info_hyp[j]) is equal to 1, it is determined that thisvalue is not equal to 1 (indicating that the indexed macroblock has beendetermined to be dirty) (NO branch of STEP 1830), the register values (Aand B) can be incremented by 64 to advance them to point to the scanlineof the next macroblock of the respective frames in the X direction (STEP1846) without performing a comparison of the scanlines between the twomacroblocks. If a value is set (e.g., b_cpyflag) indicating that themacroblocks should be copied and the value of the mb_info_hyp array forthe current index (mb_info_hyp[j]) is set to zero (YES branch of STEP1842), the data for that scanline may be copied from the locationpointed to by the current pointer (e.g., B) to the location pointed toby the reference pointer (e.g., A) (STEP 1840). Additionally, if theindexed scanline for each macroblock in the X direction have beencompared, the reference pointer and the current pointer (e.g., A and B)may be incremented by a stride so that they may point to next row ofscanlines (STEP 1838). The index (j) for the for the X directionmacroblock counter may be incremented (STEP 1828) and again it can bechecked if this X direction macroblock counter (j) is less than thetotal number of macroblocks in the X direction (totalmb_x) (STEP 1826)to see if the indexed scanline (e.g., indexed by t) has been comparedfor each macroblock in the X direction.

Moving now to FIGS. 19A and 19B, one embodiment for a method ofconditionally comparing and copying macroblocks using SSE2 is depicted.The macroblock comparison and conditional copy may initially be called(STEP 1910). The macroblock (mb) array (the mb_info_hyp array) isinitialized with value 1, indicating that it is assumed that all themacroblocks of the frames being compared are unmodified/static. Next thetotal number of static count is calculated. A variable (e.g.,static_mbcount) may be initialized for the frames being compared (e.g.,total number of macroblocks in the X direction (totalmb_x) multiplied bythe total number of macroblocks in the Y direction (totalmb_y)). Thenumber of static macroblocks between the two frames being compared maybe stored in this variable (e.g., static_mbcount). A loop countervariable (i) for the total number of macro blocks in the Y direction isalso initialized (STEP 1912).

The main outer loop runs on the total number of macro blocks in Ydirection, which is determined by comparing the Y direction loop counter(i) to the total number of macroblocks in the Y direction (totalmb_y)(STEP 1914). If the Y direction loop counter (i) is equal or greaterthan the total number of macroblocks in the Y direction (totalmb_y) itindicates that all the macroblocks have been compared, the method mayend and the value of the variable (e.g., static_mbcount) indicating thetotal number of static or unchanged macroblocks for the frames beingcompared may be returned (STEP 1948).

If, however, the Y direction loop counter is less than the total numberof macroblocks in the Y direction (totalmb_y) (YES branch of STEP 1914),the reference pointer (refptr) for pointing to the (macroblock orscanline of) the previous frame buffer and current pointer (currentptr)for pointing to the (macroblock or scanline) the current frame buffermay be loaded into register variables (e.g., A and B respectively).These registers may be, for example 128 bit register (e.g., _m128iregisters) variables for SSE2 SIMD.

A variable (e.g., t) for an inside loop counter (e.g., for 16 rows ofscanlines) may also be initialized. The macroblock array (themb_info_hyp array) may be initialized for the current row (e.g., the rowcorresponding to index i) (STEP 1918). The inner loop (indexed by t) maybe run on the total number of scanlines in one macroblock (e.g., 16).Accordingly, it can be checked if this loop counter (t) is less than 16(STEP 1920). If it is not, it indicates the last scanline has beenreached and the method may increment the row counter (i) (STEP 1916) andreturn to see if the last row has been processed (STEP 1914).

If, however, the last scanline has not been reached (YES branch of STEP1920), a loop counter (j) for the number of macroblocks in the Xdirection may be initialized (STEP 1924). This last loop will thus berun on the number of macroblock in the X direction. The total number ofmacroblocks may always be processed in the X direction, which is thecache direction, and then move to the next scan line. By scanning in theX direction the cache miss ratio is reduced and hence the CPUperformance increased.

Accordingly, it can be checked if this X direction macroblock counter(j) is less than the total number of macroblocks in the X direction(totalmb_x) (STEP 1926) (e.g., indicating the last macroblock of the rowhas or has not been reached). If it is not, it indicates the lastmacroblock of the row has been reached and the method may increment thescanline counter (t) (STEP 1922) and return to see if the last scanlinehas been processed (STEP 1920). If, however, the last macroblock has notbeen reached (YES branch of STEP 1926), the value for the entry in themacroblock array (the mb_info_hyp array) corresponding to the macroblockassociated with current value of the macroblock loop counter (j) may bechecked to determine if that macroblock has been processed (or processedand found to be “dirty” (e.g., the macroblock in the current frame isdifferent than the corresponding macroblock in the previous frame)).

Recall that the entries in the macroblock array (the mb_info_hyp array)were initialized to 1 indicating that they are static macroblocks. Thus,if the value of the mb_info_hyp array for the current index(mb_info_hyp[j]) is equal to 1, this macroblock has not been compared(or has been previously compared and found not to be dirty).Accordingly, if the value of the mb_info_hyp array for the current index(mb_info_hyp[j]) is equal to 1 (YES branch of STEP 1930), a scanline ofthe macroblock of the current frame may be compared to a scanline of thecorresponding macroblock of the previous frame (STEP 1932). For SSE2,the comparison may be performed using_mm_cmpeq_epi16 intrinsicfunctionality. A first call checks for the first 16 bytes, and if it'sthe same then the method compares next 16 bytes and so on . . . up to4th 16 bytes. If any of the first 16 bytes is not equal, it skipscalling of _mm_cmpeq_epi16 for remaining bytes of that macroblock andskip to the next macroblock (STEP 1932). These register values (A and B)can then be incremented to advance them to point to the scanline of thenext macroblock of the respective frames in the X direction (STEP 1934).Here, the increment may be by 4 for SSE2, as the instructions for SSE2may be 128 bit instructions.

If the comparison of the scanline of the macroblocks indicates that theyare equivalent (e.g., the return value from the _mm_cmpeq_epi16operation is not equal to 0) (YES branch of STEP 1936), the counter forthe macroblocks in the row (j) may be incremented (STEP 1928) and it canthen be determined if that macroblock was the last macroblock in the row(STEP 1926). If it was the last macroblock in the row (NO branch of STEP1926), the inside loop counter corresponding to the number of scanlines(t) may be incremented (STEP 1912) and it can be determined (STEP 1920)if that is the last scanline for that row of macroblocks.

If, however, the comparison indicates that the scanline of the twomacroblocks being compared are not equivalent (NO branch of STEP 1936),the value for that macroblock in the value of the mb_info_hyp array forthe current index (mb_info_hyp[j]) may be set to 0, indicating themacroblock is dirty and the static_mbcount indicating the number ofstatic macroblocks between the two frames decremented (STEP 1944). If avalue is set (e.g., b_cpyflag) indicating that the macroblocks should becopied and the value of the mb_info_hyp array for the current index(mb_info_hyp[j]) is set to zero (YES branch of STEP 1942), the data forthat scanline may be copied from the location pointed to by the currentpointer (e.g., B) to the location pointed to by the reference pointer(e.g., A) (STEP 1940). Additionally, if the indexed scanline for eachmacroblock in the X direction have been compared, the reference pointerand the current pointer (e.g., A and B) may be incremented by a strideso that they may point to next row of scanlines (STEP 1938). The index(j) for the for the X direction macroblock counter may be incremented(STEP 1928) and again it can be checked if this X direction macroblockcounter (j) is less than the total number of macroblocks in the Xdirection (totalmb_x) (STEP 1926) to see if the indexed scanline (e.g.,indexed by t) has been compared for each macroblock in the X direction.

If, when evaluating whether the value of the mb_info_hyp array for thecurrent index (mb_info_hyp[j]) is equal to 1, it is determined that thisvalue is not equal to 1 (indicating that the indexed macroblock has beendetermined to be dirty) (NO branch of STEP 1930), the register values (Aand B) can be incremented (e.g., by 4 for SSE2) to advance them to pointto the scanline of the next macroblock of the respective frames in the Xdirection (STEP 1946) without performing a comparison of the scanlinesbetween the two macroblocks. If a value is set (e.g., b_cpyflag)indicating that the macroblocks should be copied and the value of themb_info_hyp array for the current index (mb_info_hyp[j]) is set to zero(YES branch of STEP 1942), the data for that scanline may be copied fromthe location pointed to by the current pointer (e.g., B) to the locationpointed to by the reference pointer (e.g., A) (STEP 1940). Additionally,if the indexed scanline for each macroblock in the X direction have beencompared, the reference pointer and the current pointer (e.g., A and B)may be incremented by a stride so that they may point to next row ofscanlines (STEP 1938). The index (j) for the for the X directionmacroblock counter may be incremented (STEP 1928) and again it can bechecked if this X direction macroblock counter (j) is less than thetotal number of macroblocks in the X direction (totalmb_x) (STEP 1926)to see if the indexed scanline (e.g., indexed by t) has been comparedfor each macroblock in the X direction.

Referring to FIGS. 20A and 20B, one embodiment for a method ofconditionally comparing and copying macroblocks for non-SSE2implementations using AVX2 is depicted. The macroblock comparison andconditional copy may initially be called (STEP 2010). The macroblock(mb) array (the mb_info_hyp array) is initialized with value 1,indicating that it is assumed that all the macroblocks of the framesbeing compared are unmodified/static. Next the total number of staticcount is calculated. A variable (e.g., static_mbcount) may beinitialized for the frames being compared (e.g., total number ofmacroblocks in the X direction (totalmb_x) multiplied by the totalnumber of macroblocks in the Y direction (totalmb_y)). The number ofstatic macroblocks between the two frames being compared may be storedin this variable (e.g., static_mbcount). A loop counter variable (i) forthe total number of macro blocks in the Y direction is also initialized(STEP 2012).

The main outer loop runs on the total number of macro blocks in Ydirection, which is determined by comparing the Y direction loop counter(i) to the total number of macroblocks in the Y direction (totalmb_y)(STEP 2014). If the Y direction loop counter (i) is equal or greaterthan the total number of macroblocks in the Y direction (totalmb_y) itindicates that all the macroblocks have been compared, the method mayend and the value of the variable (e.g., static_mbcount) indicating thetotal number of static or unchanged macroblocks for the frames beingcompared may be returned (STEP 2048).

If, however, the Y direction loop counter is less than the total numberof macroblocks in the Y direction (totalmb_y) (YES branch of STEP 2014),the reference pointer (refptr) for pointing to the (macroblock orscanline of) the previous frame buffer and current pointer (currentptr)for pointing to the (macroblock or scanline) the current frame buffermay be loaded into register variables (e.g., A and B respectively).These registers may be, for example 256 bit register (e.g., _m256iregisters) variables for AVX2 SIMD.

A variable (e.g., t) for an inside loop counter (e.g., for 16 rows ofscanlines) may also be initialized. The macroblock array (themb_info_hyp array) may be initialized for the current row (e.g., the rowcorresponding to index i) (STEP 2018). The inner loop (indexed by t) maybe run on the total number of scanlines in one macroblock (e.g., 16).Accordingly, it can be checked if this loop counter (t) is less than 16(STEP 2020). If it is not, it indicates the last scanline has beenreached and the method may increment the row counter (i) (STEP 2016) andreturn to see if the last row has been processed (STEP 2014).

If, however, the last scanline has not been reached (YES branch of STEP2020), a loop counter (j) for the number of macroblocks in the Xdirection may be initialized (STEP 2024). This last loop will thus berun on the number of macroblock in the X direction. The total number ofmacroblocks may always be processed in the X direction, which is thecache direction, and then move to the next scan line. By scanning in theX direction the cache miss ratio is reduced and hence the CPUperformance increased.

Accordingly, it can be checked if this X direction macroblock counter(j) is less than the total number of macroblocks in the X direction(totalmb_x) (STEP 2026) (e.g., indicating the last macroblock of the rowhas or has not been reached). If it is not, it indicates the lastmacroblock of the row has been reached and the method may increment thescanline counter (t) (STEP 2022) and return to see if the last scanlinehas been processed (STEP 2020). If, however, the last macroblock has notbeen reached (YES branch of STEP 2026), the value for the entry in themacroblock array (the mb_info_hyp array) corresponding to the macroblockassociated with current value of the macroblock loop counter (j) may bechecked to determine if that macroblock has been processed (or processedand found to be “dirty” (e.g., the macroblock in the current frame isdifferent than the corresponding macroblock in the previous frame)).

Recall that the entries in the macroblock array (the mb_info_hyp array)were initialized to 1 indicating that they are static macroblocks. Thus,if the value of the mb_info_hyp array for the current index(mb_info_hyp[j]) is equal to 1, this macroblock has not been compared(or has been previously compared and found not to be dirty).Accordingly, if the value of the mb_info_hyp array for the current index(mb_info_hyp[j]) is equal to 1 (YES branch of STEP 2030), a scanline ofthe macroblock of the current frame may be compared to a scanline of thecorresponding macroblock of the previous frame (STEP 2032). For AVX2,the comparison may be performed using _mm256_orsi256_and_mm256_xor_si256intrinsic functionality. A first call checks for the 64 bytes and asecond call to _mm256_testz_si256 may be used to determine the compareresults (STEP 2032). These register values (A and B) can then beincremented to advance them to point to the scanline of the nextmacroblock of the respective frames in the X direction (STEP 2034).Here, the increment may be by 2 for AVX2, as the instructions for AVX2may be 256 bit instructions.

If the comparison of the scanline of the macroblocks indicates that theyare equivalent (e.g., the return value from the _mm_cmpeq_epi16operation is not equal to 0) (YES branch of STEP 2036), the counter forthe macroblocks in the row (j) may be incremented (STEP 2028) and it canthen be determined if that macroblock was the last macroblock in the row(STEP 2026). If it was the last macroblock in the row (NO branch of STEP2026), the inside loop counter corresponding to the number of scanlines(t) may be incremented (STEP 2012) and it can be determined (STEP 2020)if that is the last scanline for that row of macroblocks.

If, however, the comparison indicates that the scanline of the twomacroblocks being compared are not equivalent (NO branch of STEP 2036),the value for that macroblock in the value of the mb_info_hyp array forthe current index (mb_info_hyp[j]) may be set to 0, indicating themacroblock is dirty and the static_mbcount indicating the number ofstatic macroblocks between the two frames decremented (STEP 2044). If avalue is set (e.g., b_cpyflag) indicating that the macroblocks should becopied and the value of the mb_info_hyp array for the current index(mb_info_hyp[j]) is set to zero (YES branch of STEP 2042), the data forthat scanline may be copied from the location pointed to by the currentpointer (e.g., B) to the location pointed to by the reference pointer(e.g., A) (STEP 2040). Additionally, if the indexed scanline for eachmacroblock in the X direction have been compared, the reference pointerand the current pointer (e.g., A and B) may be incremented by a strideso that they may point to next row of scanlines (STEP 2038). The index(j) for the for the X direction macroblock counter may be incremented(STEP 2028) and again it can be checked if this X direction macroblockcounter (j) is less than the total number of macroblocks in the Xdirection (totalmb_x) (STEP 2026) to see if the indexed scanline (e.g.,indexed by t) has been compared for each macroblock in the X direction.

If, when evaluating whether the value of the mb_info_hyp array for thecurrent index (mb_info_hyp[j]) is equal to 1, it is determined that thisvalue is not equal to 1 (indicating that the indexed macroblock has beendetermined to be dirty) (NO branch of STEP 2030), the register values (Aand B) can be incremented (e.g., by 2 for AVX2) to advance them to pointto the scanline of the next macroblock of the respective frames in the Xdirection (STEP 2046) without performing a comparison of the scanlinesbetween the two macroblocks. If a value is set (e.g., b_cpyflag)indicating that the macroblocks should be copied and the value of themb_info_hyp array for the current index (mb_info_hyp[j]) is set to zero(YES branch of STEP 2042), the data for that scanline may be copied fromthe location pointed to by the current pointer (e.g., B) to the locationpointed to by the reference pointer (e.g., A) (STEP 2040). Additionally,if the indexed scanline for each macroblocks in the X direction havebeen compared, the reference pointer and the current pointer (e.g., aand B) may be incremented by a stride so that they may point to next rowof scanlines (STEP 2038). The index (j) for the for the X directionmacroblock counter may be incremented (STEP 2028) and again it can bechecked if this X direction macroblock counter (j) is less than thetotal number of macroblocks in the X direction (totalmb_x) (STEP 2026)to see if the indexed scanline (e.g., indexed by t) has been comparedfor each macroblock in the X direction.

As can be seen from FIGS. 18-20, the three embodiments of methodspresented herein are based on similar logic but may use different callsfor comparison operation. The following table listed the differentstandard and intrinsic functions used for comparison operation fordifferent CPU type.

TABLE 1 Macroblock Comparison Standard and Intrinsic Functions No CPUtype Comparison Intrinsics or function 1 Default (Using c memcmp ( ) libfunction) 2 SSE2 _mm_cmpeq_epi16 ( ) _mm_movemask_epi8 ( ) 3 AVX2_mm256_or_si256, _mm256_xor_si256, _mm256_testz_si256Partial Color Space Conversion

Returning now to FIGS. 10A and 10B, as discussed above embodiments ofvirtual mobile device platform as described herein may serve to conservecomputational resources by identifying macroblocks of the original frame(e.g., original frame 1032) as unchanged (e.g., ZMV macroblocks) andbased on the identification of a macroblock as a ZMV macroblock,avoiding performing color space conversion (by color space converter1018) on that macroblock. The previously converted or encoded version ofthat macroblock (e.g., encoded previous frame 1057) may then be used bythe other components of the display processing pipeline to avoidunnecessary duplicative processing. It will be noted that while certainembodiments may bypass color space conversion based on identification ofZMV macroblocks according to embodiments of ZMV identification asdisclosed herein, the identification of ZMV macroblocks may be doneaccording to almost any methodology desired in conjunction withembodiments of bypassing color space as discussed.

Accordingly, color space converter 1018 may be configured to perform RGBto YUV color space conversion at a pixel macroblock level (e.g., 16×16pixel macroblock) instead of at a full screen level as is typical invideo and image compression pre-processing. Recall that embodiments asdiscussed employ a virtual display pipeline that takes the frame bufferoutput (e.g., original frame 1032) from the virtual device (e.g., fromAndroid display system 1034) and then pre-processes the frame data andencodes it before sending this data out to a connected physical mobiledevice 1090. In certain embodiments, an H.264 or other block based videoencoder serves as the primary encoder (e.g., display encoder 1044). Asblock based video encoders such as H.264 encoders (e.g., an x264encoder) require frame data to exist in the YUV color space, the RGBframe data of original frame 1032 is converted to YUV frame data as apre-processing step in pre-processor 1038.

Traditionally, color space conversion (CSC) operates on a whole screenbuffer for conversion of the RGB pixel format to the YUV pixel format.In embodiments as disclosed however, color space converter 1018 isconfigured to perform this conversion at the macroblock unit level toconvert macroblocks of RGB pixel data to macroblocks of YUV pixel data.By converting the frame data at the macroblock level significantperformance optimizations may be achieved by only performing partialscreen CSC when possible (e.g., only on macroblocks that are notidentified as ZMV macroblocks as discussed).

As may be recalled from the previous disclosures, during the displaypre-processing, before the CSC operation, the macroblockcompare-and-conditionally-copy functionality (e.g., as accomplished byZMV detector 1074) compares the current frame RGB data with the previousframe RGB data to find the macroblocks whose data has not changed andthus is static (e.g., macroblock type ZMV). A data structurerepresenting the list of macroblocks that the screen is organized intois used to capture this information (e.g., in current frame metadata1058). This data structure may be referred to as mb_info_hyp (or,interchangeably, mb_info or mb_info_array). The data structure may be anarray representing the macroblocks of the current frame, where anelement representing each macroblock has a Boolean value where “false”(0) represents a non-static macro blocks while “true” (1) represents astatic macroblock (ZMV macroblock) in the array.

In certain embodiments, the memory buffer created to store YUV dataconverted from RGB data (e.g., converted frame 1042) is a singlestatically allocated memory buffer that is only ever written to by theCSC functionality (e.g., color space converter 1018). Thus, once a listof unmodified macroblocks (ZMV macroblocks) is obtained from the RGBbuffer comparison, the color space converter 1018 can simply skip overthe expensive colors pace conversion process for these particularunmodified blocks, as the previously converted YUV data for these blockshas already been generated and written to the YUV buffer. In otherwords, in this embodiment the converted frame 1042 may overwrite thesame memory location where original last frame 1052 in the YUV format isstored (e.g., a single statically allocated memory buffer). This bypassof CSC operations can save significant amount of the CPU cycles.

As shown in FIG. 21, embodiments may utilize a data structure indicatingstatic macroblock of the current frame to perform color space conversionon only those macroblocks of the current frame that are not static(e.g., those that have been changed since the previous frame). Inparticular, according to one embodiment, the original current frame 2132in the RGB color space and the original previous frame 2152 in the RGBcolor space are compared to determine which macroblocks of the currentframe 2132 are static relative to the previous frame 2152 (STEP 2150).This comparison may be accomplished, for example, using embodiments ofthe conditional copy and compare discussed above or other methods ofcomparing may be utilized.

During this comparison a data structure 2110 indicating whichmacroblocks of the current frame 2132 are (and are not) static relativeto the previous frame 2152. For example, embodiments of the macroblockcompare-and-conditionally-copy functionality discussed may populate anmb_info_hyp array which contains the static (and non-static) macroblockinformation. Specifically, embodiments of the conditional copy andcompare functionality may utilize an mb_info_hyp array including anelement corresponding to each macroblock of a frame. During theconditional copy and compare each element of the mb_info_hyp may bepopulated with a value (e.g., 1 or 0) indicating if the macroblock ofthe current frame corresponding to that element is static or non-static.As illustrated in the example of FIG. 21, the dotted boxes (with thevalue 0) of the data structure 2110 represent the non-static macroblockwhile lined boxes (with the value 1) represent the static macroblocks ofthe current frame 2132.

This data structure 2110 may be utilized by the color space converter2118 (e.g., libyuv) so that color space conversion may only be performedby the color space converter 2118 for the macroblocks that are indicatedas being non-static (e.g., have a value 0 in the mb_info_hyp array).Specifically, YUV frame buffer 2142 may contain the previous frame(e.g., previous frame 2152) in the YUV color space. Thus, color spaceconverter 2118 may only convert the identified non-static macroblocks ofcurrent frame 2132 from the RGB color space to the YUV color space.These converted macroblocks of the current frame (e.g., current frame2132) may replace the corresponding macroblocks of the previous frame(e.g., previous frame 2152) in the YUV frame buffer (or converted framebuffer) 2142. The YUV macroblocks of the previous frame (e.g., previousframe 2152) corresponding to static macroblocks may thus be retained inthe YUV frame buffer 2142 (as they haven't changed from the previousframe 2152 to the current frame 2132), avoiding the need to performcolor space conversion on these macroblocks. The frame in the YUV framebuffer 2142 (e.g., the YUV420 frame buffer prepared by libyuv) thusrepresents the current frame 2132 in the YUV color space. The currentframe in the YUV color space in YUV frame buffer 2142 may then beutilized by video encoder 2146 (e.g., an x264 encoder) to produce anencoded frame. By bypassing color space conversion for the staticmacroblocks CPU cycles may be saved, memory usage reduced, and the speedof color space conversion increased.

In certain embodiments then, color space converter 1018 may utilize amodified standard CSC library (e.g., libyuv), configured (e.g., byadding code or interfaces) to properly and efficiently preform CSC atthe macroblock level. As part of these changes, an additional parameter(e.g., a pointer to a mb_info_hyp type object) may be added to the RGBto YUV function APIs. In one particular embodiment, the RGB frame isconverted to YUV using, for example, libyuv/ffmpeg code. Accordingly,the color space conversion (e.g., the libyuv code) is adapted toevaluate the mb_info array containing information about staticmacroblocks and skip static that macroblocks. Additionally, as there maybe a big difference between accuracy of the computations in ffmpeg andlibyuv, the libyuv RGBtoYUV420 formula and conversion code is adapted tomaintain the computing accuracy

FIGS. 22-23 are two flow charts depicting embodiments of methods forconverting a frame in the RGB color space to the YUV color space. Itwill be noted that such methods may be implemented by modifying anexisting library or conversion process (e.g., such as the libyuvlibrary) to utilize a data structure indicating static or non-staticmacroblocks and performing the color space conversion on individualmacroblocks of the current frame (e.g., thus skipping color spaceconversion for static macroblocks). FIGS. 22A and 22B describe oneembodiment of a method for skipping CSC on ZMV macroblocks (e.g., thatmay include a modification of an interface (API) for libyuv code) whileFIG. 23 represents an embodiment of a method for converting color space(e.g., which may include a modification done in the RGB to YUVconversion module of the library).

Here, a ZMV detector (e.g., ZMV detector 1074) or another component orcalling application may pass the color space converter (e.g., colorspace converter 1018) a reference to a macroblock data structure (e.g.,mb_info_hyp array) containing the static (or non-static) macroblockinformation for a corresponding current frame.

As has been mentioned, the size of the data structure with themacroblock data (e.g., mb_info_hyp array) may be equal to the totalnumber of macroblock in that frame and the component may allocate memoryfor, or populate, that array (e.g., with 0 for non-static macroblockpositions and 1 for static macroblock positions). In one embodiment, ifthe calling component desires for color space conversion to be performedon the entire frame, the component can pass NULL pointer instead of thepointer to the data structure. A loop on the height of the frame to callthe RGB to YUV conversion module on each row.

Specifically in one embodiment, if the mb_info_hyp array is not null,the method may check for the 16 pixel boundary in Y direction and if 16pixels have already processed in the Y direction, the mb_info_hyp arraypointer can be incremented to point to the next macroblock row. TheRGBtoYUV module may first check for the 16 pixel boundary in the Xdirection and if 16 pixels have already processed, the index of themb_info_hyp array can be incremented to scan for the next macroblockstatus. If the current macroblock being evaluated is static (e.g.,mb_info_hyp array contains 1 for the macroblock), RGBtoYUV conversionmodule skips the RGBtoYUV420 conversion process for that macroblock andincrements ARGB and YUV buffer pointers and index of the mb_info_hyparray to process the next set of pixel data.

Referring specifically now to FIGS. 22A and 22B, an API or otherinterface for the color space conversion (e.g., an ARGB to YUV420conversion) may be called by a component and passed a reference orpointer to a macroblock data structure (e.g., mb_info_hyp array)containing macroblock data indicating whether each macroblock of acurrent frame is static or non-static (STEP 2210).

Next, the total number of macroblocks in the X direction (e.g., referredto as Total_MBx or MBx) may be determined. (STEP 2220). Thisdetermination may be made to the nearest integer multiple of the pixelsof a macroblock (e.g., here 16) by adding a value corresponding to thenumber of pixels of a macroblock (e.g., here 15) to the width of theframe in pixels and dividing by the number of pixels in a macroblock. Apointer (e.g., mb_info_array pointer) to reference the macroblock datastructure or elements thereof may be initialized to point to the firstelement of the array and a row index (e.g., Y) initialized (STEP 2230).The row index can then be incremented to correspond to the next row(STEP 2270).

A loop may be run based on the height of the frame (e.g., number ofpixels in the Y direction) to call a color space conversion module(e.g., an RGB to YUV conversion module that may be included in the colorspace conversion module) on each row of macroblocks. Specifically, inone embodiment, if the data structure with the macroblock data (e.g.,mb_info_hyp array) is not NULL, the 16 pixel boundary in the Y directionmay be checked and if the 16^(th) pixel is already processed in Ydirection, the pointer may be pointed to the next macroblock row.

Specifically, a variable (Y) may be used to track the rows of pixelsthat have been processed and incremented. Thus, this variable can bechecked to see if the last pixel of the last macroblock in the Ydirection has been processed (STEP 2232). If the last row of macroblockshas been processed (YES branch of STEP 2232) the conversion process mayend. If, however, the last row of macroblocks has not been processed (NObranch of STEP 2232) it can be determined if the macroblock datastructure (e.g., mb_info_hyp array) that was referenced is NULL or isotherwise empty or invalid (STEP 2234). If so (YES branch of STEP 2234)the conversion process may be called on the entire current frame (STEP2238). Following the conversion process, one or more pointers may beupdated to indicate or refer to the converted current frame (e.g., theconverted frame in the YUV color space). (STEP 2260) This pointer, oranother reference to the buffer containing the converted current frame,may be returned.

These pointers may include an ARGB pointer for pointing to the originalframe (e.g., in the YUV color space) and a YUV pointer for pointing tothe converted frame (e.g., in the RGB color space). In the case wherethe entire frame has been converted, these pointers may be updated topoint to the last row of the each of the respective frames and thevariable (Y) used to track the rows of pixels that have been processedmay be set to a value corresponding to the last row of pixels, such thatthe method may end when this variable (Y) is subsequently checked (STEP2232).

If, however, the macroblock data structure (e.g., mb_info_hyp array) isnot invalid (NO branch of STEP 2234), it can then be determined if the16^(th) pixel in the Y direction has been processed (STEP 2236). Thisstep may be desired as the libyuv may loop on the total number ofscanlines and the macroblock size may be 16×16. This determination maybe made by bitwise anding the variable (Y) used to track the rows ofpixels that have been processed with a mask (e.g., 0xF) and checking theresult.

If the 16^(th) pixel in the Y direction has not been processed (NObranch of STEP 2236) the color space conversion process (e.g., an RGB toYUV color space conversion process) can be called to perform the colorspace conversion on the row of pixels for the referenced macroblock(s)(STEP 2250). This call to the color space conversion process may passthe pointer (e.g., mb_info pointer) into the macroblock data structure(e.g., mb_info_hyp array) to indicate the row of macroblocks (or indexthereof) for pixels that are to be converted. A pointer to the currentframe may also be passed on the color space conversion process. If the16^(th) pixel in the Y direction has been processed (YES branch of STEP2236) the pointer (into the macroblock data structure (e.g., mb_info_hyparray) is updated to point to the next row of the macroblock datastructure (e.g., mb_info_hyp array) by incrementing by the number ofmacroblocks in the X direction (e.g., Total_MBx) (STEP 2240) before thecolor space conversion process is called to perform the color spaceconversion on the row of pixels for the referenced macroblock(s) (STEP2250).

Following the conversion process, one or more pointers may be updated toindicate or refer to the converted current frame (e.g., the convertedframe in the YUV color space). (STEP 2260). These pointers may includean ARGB pointer for pointing to the original frame (e.g., in the YUVcolor space) and a YUV pointer for pointing to the converted frame(e.g., in the RGB color space). Thus, data of the original frame may beaccessed using the ARGB pointer and converted pixel data written intothe converted frame using the YUV pointer. Thus, at this point thesepointers are incremented or otherwise updated to point to the next rowof pixels in the respective frames (e.g., the memory locations used tohold the original frame and the converted frame). The variable (Y) usedto track the rows of pixels that have been processed may be incremented(STEP 2270) and checked again (STEP 2232) to determine if the last rowof pixels has been converted.

Moving on to FIG. 23, a flow diagram for one embodiment of a method forcolor space conversion process that may be used with certain embodimentsis depicted. In an embodiment of a color space conversion process anARGBtoYUV module of a color space converter (e.g., color space converter1018) may first check the 16 pixel boundary in the X direction and if 16pixels are already processed, the index of the macroblock data structure(e.g., mb_info_hyp array) incremented to scan the status of the nextmacroblock. If the current macroblock being processed is static (e.g.,mb_info_hyp array contains 1 for the macroblock), the color spaceconversion (e.g., ARGBtoYUV conversion) for that macroblock may beskipped and increments the ARGB and YUV buffer pointers and index of themb_info_hyp array to process the next set of pixel data.

Specifically, in one embodiment, the color space conversion method maybe an RGB to YUV conversion process that may be called with a pointer orother reference to a row of macroblocks that are to be color spaceconverted (STEP 2310). This reference may include a reference to amacroblock data structure (e.g., mb_info_hyp array) indicating thecorresponding row of macroblocks of the current frame to be converted.Reference may also include pointers (e.g., ARGB or YUV pointers)referencing locations in the original frame of data (e.g., in the RGBformat) or a location where converted data (e.g., YUV data for thepixels) is to be stored. A pixel variable (X) may be initialized (e.g.,to 0) to track the macroblocks or the pixels of the row. A macroblockindex (mb_index) that may be used to indicate the macroblock of the rowin the X direction may also be initialized. This pixel variable (X) maythus be incremented by the number of pixels in a macroblock in the Xdirection (e.g., 16) (STEP 2315).

This pixel variable (X) can therefore be checked to determine if thelast macroblock of the row of macroblocks has been reached by comparingit to the width of the frame (e.g., the number of pixels of the framesize in the X direction) (STEP 2320). If the last macroblock of the rowhas been reached (e.g., the variable (x) is equal to or greater than thewidth) the color space conversion for the row of macroblocks may end(YES branch of STEP 2320).

If the last macroblock of the row has not been reached (NO branch ofSTEP 2320), it can be determined if the 16 pixel boundary for amacroblock has been reached (STEP 2330). This may be done by checking tosee if any remainder is obtained when dividing the pixel variable (x) by16 (e.g., the pixel width of each macroblock). If the last pixel of amacroblock has not been processed (NO branch of STEP 2230) (e.g., ifthere is a remainder when dividing the index variable (x) by 16), thevariable tracking the current macroblock being processed (mb_index) maybe incremented such that the next macroblock in the row can be indexed(STEP 2340). The macroblock data structure may be then be indexed basedon the macroblock tracking variable (mb_index) to determine the valueassociated with the corresponding macroblock in the macroblock datastructure. Based on that value it can be determined if the macroblock ofthe corresponding frame is a static macroblock or non-static macroblock(e.g., relative to the previous frame) (STEP 2350). For example, asdiscussed, the macroblock data structure (mb_info_hyp array) may containa 0 if the corresponding macroblock of the current frame is non-static(e.g., has changed relative the previous frame) and a 1 if thecorresponding macroblock of the current frame is static (e.g., has notchanged relative to the previous frame).

If the last macroblock of the row is non-static (YES branch of STEP2350), an RGB to YUV color space conversion may be performed on thatmacroblock (STEP 2360). The converted macroblock can then be stored inthe appropriate location in a converted frame buffer as previouslydiscussed. As noted above, a YUV pointer may be used to point to thememory location where the converted data is to be written. The pointers(e.g., the ARGB pointer for the original frame data and the YUV pointerfor the converted frame) can then be updated (STEP 2370) before thepixel index is again checked (STEP 2320). If, however, the lastmacroblock of the row is static (YES branch of STEP 2350), the RGB toYUV color space conversion may be skipped for that macroblock and thepointers updated (STEP 2370) before the pixel index is again checked(STEP 2320).

In one embodiment, from ITU-R BT.601. RGB to YPbPr is derived asfollows:

Y = 0.299 R + 0.587 G + 0.114 B P_(B)= −0.168 R −0.331 G + 0.5 B P_(R)=0.5 R −0.4186 G − 0.0813 B Digital Y′CbCr (8 bits per sample) isderived from analog R′G′B′ as follows: Y = 16 + (65.481 R + 128.533 G +24.966 B) C_(b) = 128 + (−37.797 R −74.203 G + 112 B) C_(r) = 128 + (112R −93.786 G − 18.214 B) The resultant signals range from 16 to 235 forY′ (Cb and Cr range from 16 to 240)

Libyuv conversion native original C code:

static _(——)inline int RGBToY(uint8 r, uint8 g, uint8 b) { return (66 *r + 129 * g + 25 * b + 0x1080) >> 8; } static _(——)inline intRGBToU(uint8 r, uint8 g, uint8 b) { return (112 * b − 74 * g − 38 * r +0x8080) >> 8; } static _(——)inline int RGBToV(uint8 r, uint8 g, uint8 b){ return (112 * r − 94 * g − 18 * b + 0x8080) >> 8; }

There is a big difference between accuracy of the computations in libyuvusing above formula. Ffmpeg in this respect is very accurate incomputing Y, U and V values precisely. The worst case magnitude of errorfor libyuv can be computed as under (shown for Y value). If we considera RGB as 0xFFFFFF (white):Worst case Error introduced using Rcomponent=0.299*(219/255)*255−(33*255)>>7=0.48Worst case Error introduced using Gcomponent=0.587*(219/255)*255−(65*255)>>7=0.94Worst case Error introduced using Bcomponent=0.114*(219/255)*255−(13*255)>>7=0.1

The errors may add up. Worst case error in pixel values with libyuv maybe off by −/+2 for 8 bit pixels, which is significant error. To fix thisaccuracy error, libyuv conversion native C and SIMD code may be modifiedto take the fractional part of the formula into calculation. Forexample:

The original Libyuv code implements the following formula for RGB to Y.static _(——)inline int RGBToY(uint8 r, uint8 g, uint8 b) { return (66 *r + 129 * g + 25 * b + 0x1080) >> 8; } Above code is modified to fixaccuracy issue for c code as follows. static _(——)inline intRGBToY(uint8 r, uint8 g, uint8 b) { return ((65*r + 128 *g + 24*b +0x1080 ) +((62 *r + 68*g + 124*b) >>7))>>8 } To avoid the overflow inSIMD operation above formula is adjusted as follows.  ((32*r + 64 *g +12*b + 0x1080) + ((47 *r + 17*g + 31*b) >>8))>>7 Above both fixed pointformula is a close map to original formula Y = 16 + (65.481 R + 128.533G + 24.966 B) Same modification is done for U and V component.

Although the invention has been described with respect to specificembodiments thereof, these embodiments are merely illustrative, and notrestrictive of the invention. The description herein of illustratedembodiments of the invention, including the description in the Abstractand Summary, is not intended to be exhaustive or to limit the inventionto the precise forms disclosed herein (and in particular, the inclusionof any particular embodiment, feature or function within the Abstract orSummary is not intended to limit the scope of the invention to suchembodiment, feature or function). Rather, the description is intended todescribe illustrative embodiments, features and functions in order toprovide a person of ordinary skill in the art context to understand theinvention without limiting the invention to any particularly describedembodiment, feature or function, including any such embodiment featureor function described in the Abstract or Summary. While specificembodiments of, and examples for, the invention are described herein forillustrative purposes only, various equivalent modifications arepossible within the spirit and scope of the invention, as those skilledin the relevant art will recognize and appreciate. As indicated, thesemodifications may be made to the invention in light of the foregoingdescription of illustrated embodiments of the invention and are to beincluded within the spirit and scope of the invention. Thus, while theinvention has been described herein with reference to particularembodiments thereof, a latitude of modification, various changes andsubstitutions are intended in the foregoing disclosures, and it will beappreciated that in some instances some features of embodiments of theinvention will be employed without a corresponding use of other featureswithout departing from the scope and spirit of the invention as setforth. Therefore, many modifications may be made to adapt a particularsituation or material to the essential scope and spirit of theinvention.

Reference throughout this specification to “one embodiment”, “anembodiment”, or “a specific embodiment” or similar terminology meansthat a particular feature, structure, or characteristic described inconnection with the embodiment is included in at least one embodimentand may not necessarily be present in all embodiments. Thus, respectiveappearances of the phrases “in one embodiment”, “in an embodiment”, or“in a specific embodiment” or similar terminology in various placesthroughout this specification are not necessarily referring to the sameembodiment. Furthermore, the particular features, structures, orcharacteristics of any particular embodiment may be combined in anysuitable manner with one or more other embodiments. It is to beunderstood that other variations and modifications of the embodimentsdescribed and illustrated herein are possible in light of the teachingsherein and are to be considered as part of the spirit and scope of theinvention.

In the description herein, numerous specific details are provided, suchas examples of components and/or methods, to provide a thoroughunderstanding of embodiments of the invention. One skilled in therelevant art will recognize, however, that an embodiment may be able tobe practiced without one or more of the specific details, or with otherapparatus, systems, assemblies, methods, components, materials, parts,and/or the like. In other instances, well-known structures, components,systems, materials, or operations are not specifically shown ordescribed in detail to avoid obscuring aspects of embodiments of theinvention. While the invention may be illustrated by using a particularembodiment, this is not and does not limit the invention to anyparticular embodiment and a person of ordinary skill in the art willrecognize that additional embodiments are readily understandable and area part of this invention.

Embodiments discussed herein can be implemented in a computercommunicatively coupled to a network (for example, the Internet),another computer, or in a standalone computer. As is known to thoseskilled in the art, a suitable computer can include a central processingunit (“CPU”), at least one read-only memory (“ROM”), at least one randomaccess memory (“RAM”), at least one hard drive (“HD”), and one or moreinput/output (“I/O”) device(s). The I/O devices can include a keyboard,monitor, printer, electronic pointing device (for example, mouse,trackball, stylus, touch pad, etc.), or the like. In embodiments of theinvention, the computer has access to at least one database over thenetwork.

ROM, RAM, and HD are computer memories for storing computer-executableinstructions executable by the CPU or capable of being compiled orinterpreted to be executable by the CPU. Suitable computer-executableinstructions may reside on a computer readable medium (e.g., ROM, RAM,and/or HD), hardware circuitry or the like, or any combination thereof.Within this disclosure, the term “computer readable medium” is notlimited to ROM, RAM, and HD and can include any type of data storagemedium that can be read by a processor. For example, a computer-readablemedium may refer to a data cartridge, a data backup magnetic tape, afloppy diskette, a flash memory drive, an optical data storage drive, aCD-ROM, ROM, RAM, HD, or the like. The processes described herein may beimplemented in suitable computer-executable instructions that may resideon a computer readable medium (for example, a disk, CD-ROM, a memory,etc.). Alternatively, the computer-executable instructions may be storedas software code components on a direct access storage device array,magnetic tape, floppy diskette, optical storage device, or otherappropriate computer-readable medium or storage device.

Any suitable programming language can be used to implement the routines,methods or programs of embodiments of the invention described herein,including C, C++, Java, JavaScript, HTML, or any other programming orscripting code, etc. Other software/hardware/network architectures maybe used. For example, the functions of the disclosed embodiments may beimplemented on one computer or shared/distributed among two or morecomputers in or across a network. Communications between computersimplementing embodiments can be accomplished using any electronic,optical, radio frequency signals, or other suitable methods and tools ofcommunication in compliance with known network protocols.

Different programming techniques can be employed such as procedural orobject oriented. Any particular routine can execute on a single computerprocessing device or multiple computer processing devices, a singlecomputer processor or multiple computer processors. Data may be storedin a single storage medium or distributed through multiple storagemediums, and may reside in a single database or multiple databases (orother data storage techniques). Although the steps, operations, orcomputations may be presented in a specific order, this order may bechanged in different embodiments. In some embodiments, to the extentmultiple steps are shown as sequential in this specification, somecombination of such steps in alternative embodiments may be performed atthe same time. The sequence of operations described herein can beinterrupted, suspended, or otherwise controlled by another process, suchas an operating system, kernel, etc. The routines can operate in anoperating system environment or as stand-alone routines. Functions,routines, methods, steps and operations described herein can beperformed in hardware, software, firmware or any combination thereof.

Embodiments described herein can be implemented in the form of controllogic in software or hardware or a combination of both. The controllogic may be stored in an information storage medium, such as acomputer-readable medium, as a plurality of instructions adapted todirect an information processing device to perform a set of stepsdisclosed in the various embodiments. Based on the disclosure andteachings provided herein, a person of ordinary skill in the art willappreciate other ways and/or methods to implement the invention.

It is also within the spirit and scope of the invention to implement insoftware programming or code an of the steps, operations, methods,routines or portions thereof described herein, where such softwareprogramming or code can be stored in a computer-readable medium and canbe operated on by a processor to permit a computer to perform any of thesteps, operations, methods, routines or portions thereof describedherein. The invention may be implemented by using software programmingor code in one or more general purpose digital computers, by usingapplication specific integrated circuits, programmable logic devices,field programmable gate arrays, optical, chemical, biological, quantumor nanoengineered systems, components and mechanisms may be used. Ingeneral, the functions of the invention can be achieved by any means asis known in the art. For example, distributed, or networked systems,components and circuits can be used. In another example, communicationor transfer (or otherwise moving from one place to another) of data maybe wired, wireless, or by any other means.

A “computer-readable medium” may be any medium that can contain, store,communicate, propagate, or transport the program for use by or inconnection with the instruction execution system, apparatus, system ordevice. The computer readable medium can be, by way of example only butnot by limitation, an electronic, magnetic, optical, electromagnetic,infrared, or semiconductor system, apparatus, system, device,propagation medium, or computer memory. Such computer-readable mediumshall generally be machine readable and include software programming orcode that can be human readable (e.g., source code) or machine readable(e.g., object code). Examples of non-transitory computer-readable mediacan include random access memories, read-only memories, hard drives,data cartridges, magnetic tapes, floppy diskettes, flash memory drives,optical data storage devices, compact-disc read-only memories, and otherappropriate computer memories and data storage devices. In anillustrative embodiment, some or all of the software components mayreside on a single server computer or on any combination of separateserver computers. As one skilled in the art can appreciate, a computerprogram product implementing an embodiment disclosed herein may compriseone or more non-transitory computer readable media storing computerinstructions translatable by one or more processors in a computingenvironment.

A “processor” includes any, hardware system, mechanism or component thatprocesses data, signals or other information. A processor can include asystem with a general-purpose central processing unit, multipleprocessing units, dedicated circuitry for achieving functionality, orother systems. Processing need not be limited to a geographic location,or have temporal limitations. For example, a processor can perform itsfunctions in “real-time,” “offline,” in a “batch mode,” etc. Portions ofprocessing can be performed at different times and at differentlocations, by different (or the same) processing systems.

It will also be appreciated that one or more of the elements depicted inthe drawings/figures can also be implemented in a more separated orintegrated manner, or even removed or rendered as inoperable in certaincases, as is useful in accordance with a particular application.Additionally, any signal arrows in the drawings/Figures should beconsidered only as exemplary, and not limiting, unless otherwisespecifically noted.

As used herein, the terms “comprises,” “comprising,” “includes,”“including,” “has,” “having,” or any other variation thereof, areintended to cover a non-exclusive inclusion. For example, a process,product, article, or apparatus that comprises a list of elements is notnecessarily limited only those elements but may include other elementsnot expressly listed or inherent to such process, product, article, orapparatus.

Furthermore, the term “or” as used herein is generally intended to mean“and/or” unless otherwise indicated. For example, a condition A or B issatisfied by any one of the following: A is true (or present) and B isfalse (or not present), A is false (or not present) and B is true (orpresent), and both A and B are true (or present). As used herein, a termpreceded by “a” or “an” (and “the” when antecedent basis is “a” or “an”)includes both singular and plural of such term (i.e., that the reference“a” or “an” clearly indicates only the singular or only the plural).Also, the meaning of “in” includes “in” and “on” unless the contextclearly dictates otherwise.

What is claimed is:
 1. A system, comprising: a virtual device platformcoupled to a physical device over a network, the virtual device platformincluding a processor executing instructions on a non-transitorycomputer readable medium for implementing: a virtual machine executing avirtual device associated with the physical device communicating withthe virtual device platform over the network, the virtual deviceincluding an operating system (OS) and one or more applicationsexecuting on the OS, the OS generating a frame of display data from anapplication executing on the OS; and a video encoder for generating aconverted frame by performing color space conversion on the frame ofdisplay data and for generating an encoded frame by encoding theconverted frame, wherein the generation of the frame of display data istriggered in response to storing another encoded frame in a buffer or amemory location associated with the virtual device platform, the anotherencoded frame being associated with another frame of display data,wherein the instructions also include instructions for: sending, by thevirtual device platform, the encoded frame to the physical device,wherein a rate at which the video encoder generates converted frames isbased on a count of consecutive identical frames, and wherein the countis increased if an amount of time elapsed between reception of twoconsecutive identical frames by the video encoder is less than apredetermined amount of time and the count is reset to a predeterminedcount if the amount of time elapsed between the reception of the twoconsecutive identical frames by the video encoder is greater than orequal to the predetermined amount of time.
 2. The system of claim 1,wherein the video encoder is executing on the OS.
 3. The system of claim1, wherein generation of frames by the OS is blocked after the frame isgenerated.
 4. The system of claim 3, wherein the OS includes a displaycontrol process that is blocked after the generation of the frame andthe display control process includes a display control synchronizerresponsive to an output of the video encoder such that the generation ofthe encoded frame causes the display control synchronizer to unblock thedisplay control process.
 5. The system of claim 4, wherein the displaycontrol synchronizer includes a mutex or a semaphore.
 6. The system ofclaim 3, wherein the OS includes a display control process that controlsthe generation of the frame of display data and the generation of theframes is blocked by configuring VSYNC of the display control processaccording to a timer.
 7. The system of claim 1, wherein an input/output(I/O) thread of the video encoder that generates the converted frame isblocked until the frame of display data is generated by the OS.
 8. Thesystem of claim 1, wherein the OS includes a display control processhaving a display control synchronizer, and an input/output (I/O) threadof the video encoder that generates the converted frame includes an I/Osynchronizer for unblocking the I/O thread after receiving anotification from the display control synchronizer that the frame ofdisplay data was generated and blocking the I/O thread after thegeneration of the converted frame.
 9. The system of claim 1, wherein theanother encoded frame is written into the buffer or the memory locationprior to transmission of the another encoded frame to the physicaldevice.
 10. A system, comprising: a virtual device platform coupled to aphysical device over a network, the virtual device platform including aprocessor executing instructions on a non-transitory computer readablemedium for implementing: a virtual machine executing a virtual deviceassociated with the physical device communicating with the virtualdevice platform over the network, the virtual device including anoperating system (OS) and one or more applications executing on the OS,the OS generating a frame of display data from an application executingon the OS; and a video encoder for generating a converted frame byperforming color space conversion on the frame of display data and forgenerating an encoded frame by encoding the converted frame, wherein thevideo encoder governs a rate at which the video encoder generatesconverted frames based on a detection of identical frame; wherein theinstructions also include instructions for: sending, by the virtualdevice platform, the encoded frame to the physical device, wherein thevideo encoder maintains an identical frame counter, slows the rate to afirst rate when the identical frame counter reaches a first threshold,and slows the rate to a second rate when the identical frame counterreaches a second threshold, wherein the identical frame counter includesa count of consecutive identical frames, and wherein the count isincreased if an amount of time elapsed between reception of twoconsecutive identical frames by the video encoder is less than apredetermined amount of time and the count is reset to a predeterminedcount if the amount of time elapsed between the reception of the twoconsecutive identical frames by the video encoder is greater than orequal to the predetermined amount of time.
 11. The system of claim 10,wherein frames generated by the OS includes a first frame and a secondframe and the video encoder compares the first frame to the second frameto detect identical frames.
 12. A method, comprising: at a virtualdevice platform coupled to a physical device over a network, executing avirtual device on a virtual machine, the virtual device associated withthe physical device communicating with the virtual device platform overthe network, the virtual device including an operating system (OS) andone or more applications executing on the OS, the OS generating a frameof display data from an application executing on the OS; generating aconverted frame by a video encoder by performing color space conversionon the frame of display data; generating an encoded frame by the videoencoder by encoding the converted frame, wherein the generation of theframe of display data is triggered in response to storing anotherencoded frame in a buffer or a memory location associated with thevirtual device platform, the another encoded frame being associated withanother frame of display data; and sending, by the virtual deviceplatform, the encoded frame to the physical device, wherein a rate atwhich the video encoder generates converted frames is based on a countof consecutive identical frames, and wherein the count is increased ifan amount of time elapsed between reception of two consecutive identicalframes by the video encoder is less than a predetermined amount of timeand the count is reset to a predetermined count if the amount of timeelapsed between the reception of the two consecutive identical frames bythe video encoder is greater than or equal to the predetermined amountof time.
 13. The method of claim 12, wherein the video encoder isexecuting on the OS.
 14. The method of claim 12, wherein generation offrames by the OS is blocked after the frame is generated.
 15. The methodof claim 14, wherein the OS includes a display control process that isblocked after the generation of the frame and the display controlprocess includes a display control synchronizer responsive to an outputof the video encoder such that the generation of the encoded framecauses the display control synchronizer to unblock the display controlprocess.
 16. The method of claim 15, wherein the display controlsynchronizer includes a mutex or a semaphore.
 17. The method of claim14, wherein the OS includes a display control process that controls thegeneration of the frame of display data and the generation of the framesis blocked by configuring VSYNC of the display control process accordingto a timer.
 18. The method of claim 12, wherein an input/output (I/O)thread of the video encoder that generates the converted frame isblocked until the frame of display data is generated by the OS.
 19. Themethod of claim 12, wherein the OS includes a display control processhaving a display control synchronizer, and an input/output (I/O) threadof the video encoder that generates the converted frame includes an I/Osynchronizer for unblocking the I/O thread after receiving anotification from the display control synchronizer that the frame ofdisplay data was generated and blocking the I/O thread after thegeneration of the converted frame.
 20. A method, comprising: at avirtual device platform coupled to a physical device over a network,executing a virtual device on a virtual machine, the virtual deviceassociated with the physical device communicating with the virtualdevice platform over the network, the virtual device including anoperating system (OS) and one or more applications executing on the OS,the OS generating a frame of display data from an application executingon the OS; generating a converted frame by a video encoder by performingcolor space conversion on the frame of display data; generating anencoded frame by the video encoder by encoding the converted frame,wherein the video encoder governs a rate at which the video encodergenerates converted frames based on a detection of identical frames;sending, by the virtual device platform, the encoded frame to thephysical device, maintaining, by the video encoder, an identical framecounter; slowing, by the video encoder, the rate to a first rate whenthe identical frame counter reaches a first threshold; and slowing, bythe video encoder, the rate to a second rate when the identical framecounter reaches a second threshold, wherein the identical frame counterincludes a count of consecutive identical frames, and wherein the countis increased if an amount of time elapsed between reception of twoconsecutive identical frames by the video encoder is less than apredetermined amount of time and the count is reset to a predeterminedcount if the amount of time elapsed between the reception of the twoconsecutive identical frames by the video encoder is greater than orequal to the predetermined amount of time.
 21. The method of claim 20,wherein frames generated by the OS includes a first frame and a secondframe and the video encoder compares the first frame to the second frameto detect identical frames.